CAIIB ABM Module A Unit 4 MCQs – Correlation and Regression. Explore CAIIB ABM Module A Unit 4 concepts through these 58 multiple-choice questions. This set of MCQs focuses on correlation and regression analysis, essential statistical tools for banking and finance professionals.

Question 1: What is the primary purpose of correlation analysis when examining the relationship between two variables?
Show Explanation
Correct Answer: B. To measure the strength and direction of the linear association between the variables. This option focuses specifically on what correlation analysis aims to achieve.
Question 2: What does a scatter diagram reveal when data points form a pattern where one variable increases as the other variable also increases?
Show Explanation
Correct Answer: C. A direct relationship between the variables. This option uses clearer terms like “one variable increases as the other variable also increases” instead of relying on “high values.”
Question 3: If a scatter diagram displays data points scattered randomly without a discernible pattern, what does this indicate?
Show Explanation
Correct Answer: D. An absence of correlation between the variables. This option replaces “no relationship” with “an absence of correlation” for better clarity.
Question 4: What conclusion can be drawn from a scatter diagram in which data points closely align along a straight line?
Show Explanation
Correct Answer: B. The variables have a strong linear relationship. This option uses “strong linear relationship” which is more descriptive.
Question 5: What is the primary function of the correlation coefficient (r) in statistical analysis?
Show Explanation
Correct Answer: B. To measure the strength and direction of a linear relationship between two variables. This option elaborates on the function of ‘r’ more clearly.
Question 6: Given ∑x = 35, ∑y = 40, ∑xy = 180, and N = 7, calculate the covariance between X and Y. The means of X and Y are 5 and 5.71 respectively.
Show Explanation
Correct Answer: C. -2.85. Covariance (X, Y) = (∑xy / N) – (mean of X * mean of Y) = (180 / 7) – (5 * 5.71) = 25.71 – 28.55 = -2.84. Closest option is -2.85
Question 7: Using ∑x² = 275 and N = 7, and the mean of X is 5, calculate the standard deviation of X.
Show Explanation
Correct Answer: B. 3.78. Standard deviation of X = √((∑x²/N) – (mean of X)²) = √((275/7) – (5)²) = √(39.29 – 25) = √14.29 = 3.78
Question 8: If the covariance between X and Y is 50, the standard deviation of X is 5, and the standard deviation of Y is 10, determine the correlation coefficient.
Show Explanation
Correct Answer: B. 1. Correlation coefficient = Covariance / (Standard deviation of X * Standard deviation of Y) = 50 / (5 * 10) = 50 / 50 = 1
Question 9: Which value of the correlation coefficient indicates the strongest linear relationship, regardless of direction?
Show Explanation
Correct Answer: D. -0.95. The strength is determined by the absolute value. |-0.95| is the greatest.
Question 10: What does a correlation coefficient of 0 imply about the linear relationship between two variables?
Show Explanation
Correct Answer: C. The variables do not have a linear relationship. This is a more precise way of stating the absence of a linear relationship.
Question 11: If the correlation coefficient ‘r’ between two variables X and Y is calculated using the formula r = cov(X,Y) ⁄ (σₓσᵧ), what does cov(X,Y) represent?
Show Explanation
Correct Answer: B. The average of the products of deviations of X and Y from their means. Covariance, denoted as cov(X,Y), is a measure of how much two variables change together. It is calculated as the sum of the products of the differences of each variable from its mean, divided by the number of observations. For example, if you have data for the number of hours studied (X) and marks obtained (Y) for a set of students, the covariance would indicate whether students who study more tend to get higher marks.
Question 12: The correlation coefficient ‘r’ always lies between which two values?
Show Explanation
Correct Answer: C. -1 and 1. The value of the correlation coefficient ‘r’ is always between -1 and 1, inclusive. A value of 1 indicates a perfect positive linear relationship, a value of -1 indicates a perfect negative linear1 relationship, and a value of 0 indicates no linear relationship.2 For instance, if the price of a commodity increases, and the demand decreases proportionally, the correlation coefficient would be -1. If they move in the same direction proportionally, it would be 1.
Question 13: What does a correlation coefficient of r=−1 indicate about the relationship between two variables?
Show Explanation
Correct Answer: C. There is a perfect inverse linear relationship between the variables. A correlation coefficient of -1 means that as one variable increases, the other variable decreases by a proportional amount, and all the data points lie exactly on a straight line with a negative slope. An example could be the relationship between the number of hours spent watching television and the hours spent studying for an exam, assuming a perfect inverse relationship exists where increased TV time directly leads to decreased study time.
Question 14: Calculate the covariance (cov(X, Y)) for the following data, given N=7:
∑(x−x̄)(y−ȳ) = 897.61
Show Explanation
Correct Answer: A. 128.23. The covariance cov(X,Y) is calculated as ¹⁄₇∑(x−x̄)(y−ȳ). Given ∑(x−x̄)(y−ȳ) = 897.61 and N=7, the covariance is 897.61⁄7 = 128.23. For example, if X is the number of units produced and Y is the cost, and for 7 observations, the sum of the products of deviations from the mean is 897.61, the average product of deviations, or covariance, is 128.23.
Question 15: Calculate the standard deviation of X (σₓ) for the following data, given N=7:
∑(x−x̄)² = 1967.9
Show Explanation
Correct Answer: B. 16.77. The standard deviation of X (σₓ) is calculated as √(∑(x−x̄)²⁄N). Given ∑(x−x̄)² = 1967.9 and N=7, σₓ = √(1967.9⁄7) = √281.13 ≈ 16.77. For example, if X represents the heights of 7 students and the sum of the squared differences from the mean height is 1967.9, the standard deviation of their heights is approximately 16.77 units.
Question 16: Using the following information, calculate the correlation coefficient ‘r’:
cov(X,Y)=128.23
σₓ=19.84
σᵧ=7.71
Show Explanation
Correct Answer: A. 0.8383. The correlation coefficient ‘r’ is calculated as r = cov(X,Y) ⁄ (σₓσᵧ). Given cov(X,Y)=128.23, σₓ=19.84, and σᵧ=7.71, the correlation coefficient is r = 128.23 ⁄ (19.84 × 7.71) = 128.23 ⁄ 152.97 ≈ 0.8383. This value close to 1 indicates a strong positive linear relationship between the two variables. For instance, if X is rainfall and Y is crop yield, a coefficient of 0.8383 suggests that higher rainfall is strongly associated with higher crop yields.
Question 17: In regression analysis, what is the purpose of drawing a line that best represents the points in a scatter diagram?
Show Explanation
Correct Answer: B. To find the equation of the relationship between the variables. Regression analysis helps to know the relationship between two variables, and to find the equation of this relationship, a line that best represents the points is drawn.
Question 18: The technique of least squares is used in regression analysis for what purpose?
Show Explanation
Correct Answer: B. To minimise the difference between observed and estimated values. The technique of least squares is used to ensure that the difference between the points in the scatter diagram and those on the line is minimal.
Question 19: In the regression equation y=a+bx, what does the variable ‘b’ represent?
Show Explanation
Correct Answer: C. The slope of the line. ‘b’ in the equation y=a+bx as the slope of the line.
Question 20: In the regression equation y=a+bx, what does the variable ‘a’ represent?
Show Explanation
Correct Answer: D. The y-intercept. ‘a’ as the y-intercept, which is the point where the line crosses the y-axis.
Question 21: What does a small value of the Standard Error of Estimate (Se) indicate about the prediction?
Show Explanation
Correct Answer: B. The prediction is reasonably accurate. A small value of Se indicates that our estimates are fairly accurate.
Question 22: If the regression equation is given by y=39.96+0.3258x, what is the estimated value of y when x=70?
Show Explanation
Correct Answer: B. 62.77. Using the given equation y=39.96+0.3258x, when x=70, y=39.96+0.3258×70=39.96+22.81=62.77.
Question 23: Based on the data from Example 4.1, the covariance between X and Y is 128.23 and the variance of X (σₓ²) is 393.58. What is the value of ‘b’ in the regression equation y=a+bx?
Show Explanation
Correct Answer: A. 0.3258. The formula for ‘b’ is b=cov(X,Y)⁄σₓ². Given cov(X,Y)=128.23 and σₓ²=393.58, b=128.23⁄393.58=0.3258.
Question 24: Using the data from Example 4.1, if the mean of X (x̄) is 64.57 and the mean of Y (ȳ) is 61, and the calculated value of ‘b’ is 0.3258, what is the value of ‘a’ in the regression equation y=a+bx?
Show Explanation
Correct Answer: C. 39.96. The formula for ‘a’ is a=ȳ−bx̄. Given ȳ=61, b=0.3258, and x̄=64.57, a=61−(0.3258×64.57)=61−21.04=39.96.
Question 25: If the standard error of estimate (Se) is 5 and the estimated value of y (ŷ) for a given x is 62.77, what is the 65 per cent confidence interval for the actual value of y?
Show Explanation
Correct Answer: A. (57.77, 67.77). 65 per cent confidence interval for the actual value of y lies between (ŷ−Se) and (ŷ+Se). Given ŷ = 62.77 and Se=5, the interval is (62.77−5, 62.77+5)=(57.77, 67.77).
Question 26: If the standard error of estimate (Se) is 5 and the estimated value of y (ŷ) for a given x is 62.77, what is the 95 per cent confidence interval for the actual value of y?
Show Explanation
Correct Answer: B. (52.77, 72.77). 95 per cent confidence interval for the actual value of y lies between (ŷ−2Se) and (ŷ+2Se). Given ŷ = 62.77 and Se=5, the interval is (62.77−2×5, 62.77+2×5)=(62.77−10, 62.77+10)=(52.77, 72.77).
Question 27: What type of relationship does the coefficient of correlation primarily measure?
Show Explanation
Correct Answer: B. Linear relationships. The coefficient of correlation is specifically designed to quantify the strength and direction of a straight-line association between two variables, not curved or other types of relationships. For example, it assesses if points on a scatter plot tend to fall along a straight line.
Question 28: If two variables have a perfect circular relationship on a scatter diagram, what would the value of the coefficient of correlation likely be?
Show Explanation
Correct Answer: C. 0. Even if variables have a clear pattern, if that pattern is not linear (like a circle), the coefficient of correlation, which measures linear association, will be zero. This signifies the absence of a linear trend, not necessarily the absence of any relationship.
Question 29: Correlation analysis is most appropriately applied to which type of relationships?
Show Explanation
Correct Answer: C. Only linear relationships. Correlation analysis should be applied only to linear relationships because the coefficient measures the degree of linear association. Applying it to non-linear patterns can be misleading.
Question 30: What condition should the data meet to avoid misleading results in correlation analysis?
Show Explanation
Correct Answer: C. Homogeneous. The data used for correlation analysis should be homogeneous, meaning it should come from a single, consistent source or population to ensure the calculated correlation reflects a genuine relationship within that group.
Question 31: What term describes a high correlation coefficient value that appears due to analysing heterogeneous data, even if no actual relationship exists within subgroups?
Show Explanation
Correct Answer: A. Spurious correlation. This occurs when a correlation seems to exist in a combined dataset drawn from different groups (heterogeneous), but this correlation does not exist within the individual groups themselves. For example, combining shoe size data from children and adults might show a spurious correlation with reading ability.
Question 32: A correlation coefficient value close to +1 indicates what kind of linear relationship?
Show Explanation
Correct Answer: C. Strong direct. A value near +1 signifies that the two variables have a strong tendency to increase together in a linear fashion. For example, height and weight in adults often show a strong direct correlation.
Question 33: Does a strong correlation coefficient (e.g., close to +1 or -1) definitively prove a cause-effect relationship between the two variables?
Show Explanation
Correct Answer: C. No, it only indicates association. A high correlation shows that variables move together, but it does not explain why. For instance, ice cream sales and crime rates might be positively correlated, but both are likely caused by a third variable (hot weather), not one causing the other.
Question 34: Consider the observation that over 20 years, both rice consumption in India and the number of road accidents increased, resulting in a high positive correlation. What does this illustrate?
Show Explanation
Correct Answer: B. Correlation does not imply causation. Although the two variables show a strong positive correlation, it is illogical to conclude that one causes the other; other factors likely contribute to the increase in both phenomena independently.
Question 35: What is the primary purpose of a scatter diagram in the context of correlation and regression?
Show Explanation
Correct Answer: C. To get an initial visual sense of the relationship between two variables. A scatter diagram plots observed data points and helps visually assess the pattern, direction, and strength of the potential relationship between variables before performing calculations.
Question 36: In regression analysis, what term is used for the variable that forms the basis of prediction?
Show Explanation
Correct Answer: D. Independent variable. The independent variable is the one whose values are known or controlled and used to predict the values of the other variable. For example, using years of experience (independent) to predict salary (dependent).
Question 37: What is the term for the variable that is being predicted in regression analysis?
Show Explanation
Correct Answer: C. Dependent variable. The dependent variable is the outcome or result that the analysis aims to predict based on the value(s) of the independent variable(s). For example, predicting crop yield (dependent) based on rainfall (independent).
Question 38: Which analysis technique involves fitting a line to represent the relationship between variables?
Show Explanation
Correct Answer: C. Regression analysis. Regression analysis specifically deals with finding the mathematical equation (often a line) that best describes the relationship between a dependent variable and one or more independent variables.
Question 39: What does the ‘Standard error of the estimate’ measure?
Show Explanation
Correct Answer: C. How accurate the predictions from a regression equation are likely to be. It quantifies the typical deviation between the observed values of the dependent variable and the values predicted by the regression model. A smaller standard error indicates better predictive accuracy.
Question 40: What does correlation analysis primarily measure?
Show Explanation
Correct Answer: B. The strength of the relationship. Correlation analysis focuses on quantifying the degree and direction (positive or negative) of the linear association between two variables.
Question 41: Which of the following terms is directly related to measuring the strength of a linear relationship?
Show Explanation
Correct Answer: C. Coefficient of correlation. The coefficient of correlation (often denoted as ‘r’) is the specific statistical measure used to indicate the strength and direction of a linear relationship between two variables.
Question 42: Which of the following is NOT a limitation of the coefficient of correlation?
Show Explanation
Correct Answer: D. It cannot be calculated if the sample size is small.
Question 43: If a correlation coefficient is calculated as 0, what can be definitively concluded?
Show Explanation
Correct Answer: B. There is no linear relationship between the variables. A correlation coefficient of zero specifically indicates the absence of a linear association, but a non-linear relationship (like curvilinear) might still exist.
Question 44: Analysis aiming to identify the type of relationship pattern (e.g., straight line, curve) is referred to as:
Show Explanation
Correct Answer: C. Trend Analysis. Trend analysis involves examining patterns in data over time or across variables to identify the nature of the relationship, such as whether it is linear, curvilinear, etc..
Question 45: What issue arises if one calculates a correlation coefficient using data combined from distinct groups without considering their differences?
Show Explanation
Correct Answer: C. Spurious correlation. Combining heterogeneous data can create an artificial correlation that doesn’t truly represent the relationship within the individual, homogeneous subgroups.
Question 46: What is the range of possible values for the coefficient of correlation?
Show Explanation
Correct Answer: C. -1 to +1. The coefficient of correlation ‘r’ is always bounded between -1 (perfect inverse linear relationship) and +1 (perfect direct linear relationship), inclusive.
Question 47: If variable A increases whenever variable B increases, but this is due to a third factor influencing both, concluding A causes B based on a high positive correlation is an example of:
Show Explanation
Correct Answer: D. Confusing correlation with causation. A strong correlation merely shows association. Attributing causality without further evidence, especially when a lurking variable might be involved, is a common statistical fallacy.
Question 48: Covariance is mentioned as a component related to calculating the:
Show Explanation
Correct Answer: C. Coefficient of correlation. The definition of the coefficient of correlation involves covariance in the numerator, measuring how two variables change together, standardized by their respective standard deviations.
Question 49: An investigation was conducted to understand if the frequency of certain weekend public events, like parades or rallies, was associated with the number of traffic accidents occurring on those same weekends. What was the core objective of this investigation?
Show Explanation
Correct Answer: C. To determine if a statistical relationship exists between the number of weekend events and the number of weekend accidents. The goal was to see if changes in one variable (events) corresponded systematically with changes in the other variable (accidents), specifically within the weekend timeframe.
Question 50: A manager needed a way to estimate factory overhead costs, noting these costs changed with production volume but not consistently (i.e., not purely fixed or purely variable). What was the primary practical reason for seeking a predictive formula?
Show Explanation
Correct Answer: C. To improve the accuracy of product pricing and budget for costs at different production levels. Accurate cost estimation is crucial for setting prices correctly and for financial planning, especially when costs don’t follow a simple fixed or variable pattern.
Question 51: When predicting costs based on related activities, using statistical techniques to find a ‘best-fitting equation’ is often preferred over simple intuition or guessing. What is a key benefit of this statistical approach regarding prediction accuracy?
Show Explanation
Correct Answer: C. It generally provides more reliable predictions by mathematically modeling the relationship and allows for quantifying the expected prediction error. Statistical models aim to capture the underlying trend, leading to better estimates than guesses, and also provide measures (like standard error) of how much the actual values might deviate from the prediction.
Question 52: To understand and quantify how a factory’s overhead costs typically change as the number of units produced changes, analysts often develop a mathematical expression representing this relationship. What is this process of finding the representative equation called?
Show Explanation
Correct Answer: B. Regression analysis. Regression analysis is the statistical method used specifically to model the relationship between a dependent variable (like overhead costs) and one or more independent variables (like production units) by fitting an equation (often a line) to the observed data.
Question 53: If a single data point from a study examining the link between the number of weekend public events (Variable X) and the number of weekend traffic accidents (Variable Y) is recorded as (10, 4), what does the value ‘4’ signify in this specific context?
Show Explanation
Correct Answer: B. The number of accidents recorded when 10 events occurred. In the pair (X, Y) representing (Events, Accidents), the second value corresponds to the measurement of the Y variable (Accidents) when the X variable (Events) was at the level specified by the first value.
Question 54: A factory recorded an instance where producing 40 units was associated with an overhead cost of 191. Based solely on this single piece of information, what overhead cost was recorded for this specific production run of 40 units?
Show Explanation
Correct Answer: C. 191. The question provides only one specific data pairing (40 units, cost 191) and asks what cost corresponds to that specific instance according to the provided statement.
Question 55: A factory observes that on one occasion, producing 40 units resulted in an overhead cost of 191, while on another occasion, producing 40 units resulted in an overhead cost of 178. What does this variation in cost for the same production level suggest?
Show Explanation
Correct Answer: C. Factors other than just the production unit count likely influence overhead costs, or there is inherent variability. If the relationship was perfectly simple and deterministic with production units as the sole driver, the cost should be identical for the same number of units. Variation implies other factors or randomness are involved.
Question 56: When analysing records of daily factory output (e.g., observing values like 30, 35, 37, 39, 40, 40, 42, 48, 53, 56 units over a period), identifying the highest figure, ’56’, serves what primary purpose in understanding production?
Show Explanation
Correct Answer: C. To identify the peak production level achieved during the observed period. The maximum value in a dataset represents the highest point reached for that variable within the scope of the observations.
Question 57: When examining records of daily factory overhead costs (e.g., observing values like 116, 153, 155, 170, 173, 178, 191, 234, 272, 280 over a period), what does identifying the lowest figure, ‘116’, reveal?
Show Explanation
Correct Answer: C. The minimum overhead cost recorded during that specific observation period. The minimum value in a dataset indicates the lowest point observed for that variable within the set of measurements taken.
Question 58: Analysts use a derived mathematical equation (like Overhead Cost = a + b × Production Units) to estimate future costs based on past dat
Show Explanation
Correct Answer: C. To predict the expected overhead cost associated with producing 50 units. The regression equation is specifically used for prediction – plugging in a value for the independent variable (production units = 50) yields an estimate for the dependent variable (overhead cost).