A00-240 – SAS Certified Statistical Business Analyst Using SAS 9: Regression and Modeling Credential exam consists of 60 multiple-choice and short-answer questions that you will need to complete in two hours. A passing score of 68%. An exam cost \$180.

##### Q- 1

What is the default method in the LOGISTIC procedure to handle observations with missing data?

A. Missing values are imputed.
B. Parameters are estimated accounting for the missing values.
C. Parameter estimates are made on all available data.
D. Only cases with variables that are fully populated are used.

##### Q- 2

Refer to the exhibit.

These graphs were created using the GLM procedure with the plots(only)=diagnostics option. Which plot do you use to identify influential observations?

A. Cook\\’s D by Observation
B. Residual by Quantile
C. Residual by Predicted
D. Fit – Mean and Residual Plot

##### Q- 3

This question will ask you to provide missing code segments. A logistic regression model was fit on a data set where 40% of the outcomes were events (TARGET=1) and 60% were non-events (TARGET=0). The analyst knows that the population where the model will be deployed has 5% events and 95% non-events. The analyst also knows that the company\\’s profit margin for correctly targeted events is nine times higher than the company\\’s loss for an incorrectly targeted non-event.

Given the following SAS program:

What X and Y values should be added to the program to correctly score the data?

A. X=40, Y=10
B. X=.05, Y=10
C. X=.05, Y=.40
D. X=.10, Y=05

##### Q- 4

The selection criterion used in the forward selection method in the REG procedure is:

B. SLE
C. Mallows\\’ Cp
D. AIC

##### Q- 5

Refer to the following odds ratio table:

What is a correct interpretation of the estimate?

A. The odds of the event are 1.142 greater for each one dollar increase in salary.
B. The odds of the event are 1.142 greater for each one thousand dollar increase in salary.
C. The probability of the event is 1.142 greater for each one dollar increase in salary.
D. The probability of the event is 1.142 greater for each one thousand dollar increase in salary.

##### Q- 6

Which statistic is based on the maximum vertical distance between the primary event EDF and the secondary event EDF?

A. KS
B. SBC
C. Max EDF
D. Brier Score

##### Q- 7

Assume a \$10 cost for soliciting a non-responder and a \$200 profit for soliciting a responder. The logistic regression model gives a probability score named P_R on a SAS data set called VALID. The VALID data set contains the responder variable Pinch, a 1/0 variable coded as 1 for the responder. Customers will be solicited when their probability score is more than 0.05.

Which SAS program computes the profit for each customer in the data set VALID?

A. Option A
B. Option B
C. Option C
D. Option D

##### Q- 8

Within PROC GLM, the interaction between the two categorical predictors, Income and Gender, was shown to be significant. An item store was saved from the GLM analysis. Which statement from PROC PLM would test the significance of Gender within each level of Income and adjust for multiple tests?

B. slice Income*Gender / sliceby=Gender adjust=tukey;
C. slice Income*Gender / sliceby=Income adjust=tukey;

##### Q- 9

The PROC LOGISTIC options SELECTION=SCORE and BEST=2 are used in a MODEL statement to generate a series of predictive models. The models are assigned numbers in order from 1 to 99 reflecting the fact that there are 50 candidate input variables.

Results from the collection of derived models are used to generate the following plot of overall average profit by model number. Results are restricted to models with at least 9 inputs and at most 40 inputs.

The maximum value for the training data occurs for model number 46, and the maximum value for the validation data occurs for model number 43. If you base model selection solely on overall average profit, what is the correct choice?

A. Select model 46
B. Select model 43
C. Select model 45
D. Select model 21

##### Q- 10

Which SAS program will detect collinearity in a multiple regression application?

A. Option A
B. Option B
C. Option C
D. Option D

##### Q- 11

While building a predictive model, median imputations are performed while preparing the training data. How should the imputations be addressed in the validation data?

A. The imputed values are irrelevant to the validation data and are not used.
B. The imputed values must be applied directly to the validation data without recalculation.
C. The imputed values must be recalculated using the validation data.
D. The imputed values must be recalculated using both the training and the validation data.

##### Q- 12

Refer to the exhibit:

SAS output from the SQUARE selection method, within the REG procedure, is shown. The top two models in each subset are given. Based on the exhibit, which statement is true?

A. The AIC champion model is more parsimonious than the SBC champion.
B. The SBC champion model is more parsimonious than the AIC champion.
C. The R-Square champion model is the most parsimonious.
D. Adjusted R-Square and R-Square agree on the champion model.

##### Q- 13

A confusion matrix is created for data that were oversampled due to a rare target. What values are not affected by this oversampling?

A. Sensitivity and PV+
B. Specificity and PV
C. PV+ and PV
D. Sensitivity and Specificity 