Statistics Interview Questions and Answers

1. How is the statistical significance of an insight assessed?

Ans: Hypothesis testing is used to find out the statistical significance of the insight. To elaborate, the null hypothesis and the alternate hypothesis are stated, and the p-value is calculated.

After calculating the p-value, the null hypothesis is assumed true, and the values are determined. To fine-tune the result, the alpha value, which denotes the significance, is tweaked. If the p-value turns out to be less than the alpha, then the null hypothesis is rejected. This ensures that the result obtained is statistically significant.

2.What is the statistical power?

Ans: ‘Statistical power’ refers to the power of a binary hypothesis, which is the probability that the test rejects the null hypothesis given that the alternative hypothesis is true. [2]

Image for post

3.What are the types of selection bias in statistics?

Ans: There are many types of selection bias as shown below:

  • Observer selection
  • Attrition
  • Protopathic bias
  • Time intervals
  • Sampling bias

4.How is missing data handled in statistics?

Ans: There are many ways to handle missing data in statistics:

  • Prediction of the missing values
  • Assignment of individual (unique) values
  • Deletion of rows, which have the missing data
  • Mean imputation or median imputation
  • Using random forests, which support the missing values

5. What is exploratory data analysis?

Exploratory data analysis is the process of performing investigations on data to understand the data better.

In this, initial investigations are done to determine patterns, spot abnormalities, test hypotheses, and also to check if the assumptions are right.

6.Can you give an example of root cause analysis?

Root cause analysis, as the name suggests, is a method used to solve problems by first identifying the root cause of the problem.

Example: If the higher crime rate in a city is directly associated with the higher sales in a red-colored shirt, it means that they are having a positive correlation. However, this does not mean that one causes the other.

Causation can always be tested using A/B testing or hypothesis testing.

7.What are the four main things we should know before studying data analysis?

Descriptive statistics

Inferential statistics

Distributions (normal distribution / sampling distribution)

Hypothesis testing


8. What is the difference between inferential statistics and descriptive statistics?

Ans: Descriptive statistics – provides exact and accurate information.

Inferential statistics – provides information of a sample and we need to inferential statistics to reach to a conclusion about the population.


9. What is the difference between population and sample in inferential statistics?

Ans: From the population we take a sample. We cannot work on the population either due to computational costs or due to availability of all data points for the population.  

From the sample we calculate the statistics

From the sample statistics we conclude about the population


10. What are descriptive statistics?

Ans: Descriptive statistic is used to describe the data (data properties)

5-number summary is the most commonly used descriptive statistics

11. Most common characteristics used in descriptive statistics?

  • Center – middle of the data. Mean / Median / Mode are the most commonly used as measures.
    • Mean – average of all the numbers
    • Median – the number in the middle
    • Mode – the number that occurs the most. The disadvantage of using Mode is that there may be more than one mode.
  • Spread – How the data is dispersed. Range / IQR / Standard Deviation / Variance are the most commonly used as measures.
    • Range = Max – Min
    • Inter Quartile Range (IQR) = Q3 – Q1
    • Standard Deviation (σ) = √(∑(x-µ)2 / n)
    • Variance = σ2
  • Shape – the shape of the data can be symmetric or skewed
    • Symmetric – the part of the distribution that is on the left side of the median is same as the part of the distribution that is on the right side of the median
    • Left skewed – the left tail is longer than the right side
    • Right skewed – the right tail is longer than the left side 
  • Outlier – An outlier is an abnormal value
    • Keep the outlier based on judgement
    • Remove the outlier based on judgement

12.What is the difference between population parameters and sample statistics?

  • Population parameters are:
    • Mean = µ
    • Standard deviation = σ
  • Sample statistics are:
    • Mean = x (bar)
    • Standard deviation = s

13. Why we need sample statistics?

Population parameters are usually unknown hence we need sample statistics.


14. How to find the mean length of all fishes in the sea?

Ans: Define the confidence level (most common is 95%)

Take a sample of fishes from the sea (to get better results the number of fishes > 30)

Calculate the mean length and standard deviation of the lengths

Calculate t-statistics

Get the confidence interval in which the mean length of all the fishes should be.

15.What Is P-value?

Ans : In statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. If the p-value is less than 0.05 or 0.01, corresponding respectively to a 5% or 1% chance of rejecting the null hypothesis when it is true.

16.What Are Sampling Methods?

Answer :

There are four sampling methods: 

  • Simple Random (purely random), 
  • Systematic( every kth member of population),
  • Cluster (population divided into groups or clusters)
  • Stratified (divided by exclusive groups or strata, sample from each group) samplings.

17.Give An Example Of Inferential Statistics?

Ans : Example of Inferential Statistic :

You asked five of your classmates about their height. On the basis of this information, you stated that the average height of all students in your university or college is 67 inches.

18. What is the primary goal of A/B testing?

Ans: A/B testing refers to a statistical hypothesis with two variables A and B. The primary goal of A/B testing is the identification of any changes to the web page for maximizing or increasing the outcome of interest. A/B testing is a fantastic method for finding the most suitable online promotional and marketing strategies for the business. It is basically used for testing everything from website copy to even the emails made for sales and also search ads.

19. What is the meaning of statistical power of sensitivity, and how is it calculated?

The statistical power of sensitivity refers to the validation of the accuracy of a classifier, which can be Logistic, SVM, Random Forest, etc. Sensitivity is basically Predicted True Events/Total Events. True events are the ones that  are true and also predicted as true by the model.

 20.What is the meaning of correlation and covariance in statistics?

Ans: Both correlation and covariance are basically two concepts of mathematics that  are widely used in statistics. They not only help in establishing the relations between two random variables but also help in measuring the dependency between the two. Although the work between these two mathematical terms is similar, they are quite different from each other.

  • Correlation: It is considered as the best technique for measurement and also for estimation of the quantitative relationship between the two variables. Correlation measures how efficiently two variables are related.
  • Covariance: In this, two terms vary together, and it is a measure which shows the extent to which two random variables can change in cycle. It forms a statistical relationship between a pair of random variables, where any change is one variable reciprocates by a corresponding change in another variable.

 21.What is the meaning of correlation and covariance in statistics?

Ans: Both correlation and covariance are basically two concepts of mathematics that  are widely used in statistics. They not only help in establishing the relations between two random variables but also help in measuring the dependency between the two. Although the work between these two mathematical terms is similar, they are quite different from each other.

  • Correlation: It is considered as the best technique for measurement and also for estimation of the quantitative relationship between the two variables. Correlation measures how efficiently two variables are related.
  • Covariance: In this, two terms vary together, and it is a measure which shows the extent to which two random variables can change in cycle. It forms a statistical relationship between a pair of random variables, where any change is one variable reciprocates by a corresponding change in another variable.

22.What is the meaning of six sigma in statistics?

Ans: Six sigma is a quality assurance methodology used widely in statistics to provide ways to improve processes and functionality when working with data.

A process is considered as six sigma when 99.99966% of the outcomes of the model are considered to be defect-free.

23. What is DOE?

Ans: DOE is an acronym for the design of experiments in statistics. It is considered as the design of a task that describes the information and the change of the same based on the changes to the independent input variables.

24. What is the meaning of KPI in statistics?

Ans: KPI stands for key performance analysis in statistics. It is used as a reliable metric to measure the success of a company with respect to its achieving the required business objectives.

There are many good examples of KPIs:

  • Profit margin percentage
  • Operating profit margin
  • Expense ratio

25.What is the meaning of the five-number summary in statistics?

Ans: The five-number summary is a measure of five entities that cover the entire range of data as shown below:

  • Low extreme (Min)
  • First quartile (Q1)
  • Median
  • Upper quartile (Q3)
  • High extreme (Max)

Next up on this top Statistics Interview Questions and answers blog, let us take a look at the intermediate set of questions.

26.What are the types of sampling in statistics?

Ans: There are four main types of data sampling as shown below:

  • Simple random: Pure random division
  • Cluster: Population divided into clusters
  • Stratified: Data divided into unique groups
  • Systematical: Picks up every ‘n’ members in the data

27.What are the various branches of statistics?

Ans: Statistics have two main branches, namely:

  • Descriptive Statistics: This usually summarizes the data from the sample by making use of an index like mean or standard deviation. The methods which are used in the descriptive statistics are displaying, organizing, and describing the data.
  • Inferential Statistics: These conclude from data which are subject to random variations like observation mistakes and other sample variation.