2) A variable takes values “Yes” or “No” is [ Select ] [“Numerical”, “Nominal

Photo of author

By admin

2) A variable takes values “Yes” or “No” is [ Select ] [“Numerical”, “Nominal”, “Ordinal”] and
3) The Advertising data set consists of the sales of that product in 200 different markets, along with advertising budgets for the product in each of those markets for three different media: TV, radio, and newspaper. Please select the best choice from the dropdown for each blank.
a. Below is the correlation matrix for TV, radio, newspaper, and sales for the
Advertising data.
[ Select ] [“newspaper”, “radio”, “TV”] has the strongest linear correlation with sales and [ Select ] [“TV”, “radio”, “newspaper”] has the weakest linear correlation with sales.
b. To find the relationship between sales and media advertising budgets, we ran a multiple linear regression and got the following outputs:

– Which media does not significantly contribute to sales when the other media ads are in place? [ Select ] [“newspaper”, “TV and radio”, “none”, “TV”, “radio”]
– Which media with the same spending can generate the biggest boost in sales while holding the others the same? [ Select ] [“TV”, “newspaper”, “radio”, “intercept”]
– How much increase in sales is associated with a given increase in TV advertising while holding the others the same? [ Select ] [“< 0.0001", "0.046", "0.0014", "32.81"] – Which value can best describe how well the model fits the data? [ Select ] ["0.897", "570", "0.0014", "0.8599"] 4)Please fill in each blank by selecting the answer from the Dropdown. (a) For K-Means clustering, the "K" is for [ Select ] ["# of nearest neighbors", "# of observations", "# of folders", "# of clusters"] and it is [ Select ] ["predicted", "pre-selected"] . (b) K-Means clustering is [ Select ] ["supervised learning", "un-supervised learning"] and is [ Select ] ["for a clustering problem", "for a classification problem", "for a regression problem"] . 5) When you run a logistic regression models, which of the following are usually used as measure of performance for model selection? (check all apply) Group of answer choicesR-SquaredStandard Error AIC p-value Test Error Rate Validation Overall Accuracy MSE Adjusted R-Squared 6) Supervised vs. Unsupervised Learning? Fill in the blanks from the dropdowns: The main difference between supervised and unsupervised learning is about [ Select ] ["the training data set is large enough or not", "the model is parametric or non-parametric", "the response variable is categorical or not", "the response variable is labeled or not"] . [ Select ] ["Unsupervised", "Supervised"] learning utilizes labeled inputs and [ Select ] ["test error rate", "simulation", "validation"] can be used to check the performance of the model considering possible overfitting, [ Select ] ["KNN", "", "K-Means"] is an example of supervised learning and [ Select ] ["KNN", "K-Means"] is an example of unsupervised learning. 7) Which of the following do not make explicit assumptions about the true distribution function of the data? Group of answer choicesRegression modelsParametric methods Linear Discriminant Analysis Non-parametric methods 8)KNN (K-Nearest Neighbours) is ____________. (check all apply) Group of answer choicesnon-parametricunsupervised learning a clustering problem parametric 9)Suppose you collected data to study the relationship between basketball shooting and the distance that the player shoot from. Let X be the distance in feet, and Y be the basketball shooting result (1 for a make and 0 for a miss) . You fitted a logistic regression and found the estimated coefficients, �^0= 2, and �^1= -0.2. Based on this model and the estimates, the probability of making a shot from 10 feet of distance is [blank]. (keep 1 decimal place) 10)Which of the following is not true for decision trees: Group of answer choicesTrees can be displayed graphically, and are easily interpreted even by a non-expert.Trees are robust. Trees are very easy to explain to people. Trees can be used both for regression and classification. Trees can easily handle qualitative predictors without the need to create dummy variables. 11) [ Select ] ["Ridge Regression", "PCR", "Lasso", "PLS"] is a dimension reduction method, which first identifies a new feature 12)Bootstrap is a [blank1]. (select all apply) Group of answer choicesresampling method without replacementbagging technique resampling method with replacement validation process classification technique 13)LOOCV stands for Leave-One-Out Cross-Validation. It has a couple of major advantages over the validation set approach. First, it is far less [ Select ] ["error", "bias"] . For a dataset with sample size �, in LOOCV, we repeatedly fit the statistical learning method using training sets that contain [ Select ] ["50%", "70%", "30%", "all", "n-1"] observations. Second, in contrast to the validation approach which will yield different results when applied repeatedly due to [ Select ] ["less observations", "randomness"] in the training/validation set splits, performing LOOCV multiple times will yield the same results. Another cross-validation method we learned in this class is [ Select ] ["best subset selection", "k-Fold CV", "the validation set approach"] . Cross-validation can be used to check model performance for [ Select ] ["regression models only", "classification models only", "clustering models only", "any predictive modeling"] . 14)We run variable selection and regularization procedures to have better [ Select ] ["prediction accuracy", "data set", "calculation"] and model interpretability for [ Select ] ["supervised learning", "both supervised and unsupervised learning", "unsupervised learning"] . We can use [ Select ] ["subset selection", "PCA", "K-Means clustering"] to identify a subset of predictors and use [ Select ] ["shrinkage", "smoothing", "decision tree", "dendrogram"] to fit a linear regression model involving all predictors. In best subset selection, it starts with the null model which contains [ Select ] ["all", "1", "multiple", "0"] predictor(s) and ends with the full model with [ Select ] ["all", "multiple", "1", "0"] predictors. When p = 5 (5 predictors in the full model), the best subset selection algorithm will fit [ Select ] ["10", "5", "32", "20", "2"] models that contain exactly 1 predictor in each model, and pick the model having the smallest [ Select ] ["R-squared", "MSE/RSS"] , or largest [ Select ] ["R-squared", "MSE/RSS"] , among these models with exactly 1 predictor. At the end, it selects a single best model from the p + 1 picked models with smallest cross-validated prediction error measure such as [ Select ] ["Adjusted R-squared/AIC/BIC/Cp", "MSE", "R-squared", "RSS"] 15)Choose the most appropriate statistical technique for each analysis scenario below: Group of answer choicesHow influential are the seven predictor variables(i.e., percent urban population, GDP, birthrate, # of hospital beds, # of doctors, # of radios, and # of telephones) in predicting female life expectancy? [ Choose ] Multiple Regression KNN PCA (Principle Component Analysis) Logistic Regression K-Means To predict if a customer will churn in the next year based on the customer's profile and history of existing customers. [ Choose ] Multiple Regression KNN PCA (Principle Component Analysis) Logistic Regression K-Means To classify a potential voter into various classes like “Will Vote”, “Will not Vote”, “Will Vote to Party D", “Will Vote to Party R". [ Choose ] Multiple Regression KNN PCA (Principle Component Analysis) Logistic Regression K-Means To perform market segmentation by identifying subgroups of people who might be more receptive to a particular form of advertising or more likely to purchase a particular product. [ Choose ] Multiple Regression KNN PCA (Principle Component Analysis) Logistic Regression K-Means We have a large number of correlated predictors, we need to find a smaller new set of representative variables explain most of the variability in the original set. [ Choose ] Multiple Regression KNN PCA (Principle Component Analysis) Logistic Regression K-Means 16)Below is the boxplot of the test error rates using different statistical techniques for the same data set and analysis, which method gave the best results: Group of answer choicesKNN-1KNN-CV LDA Logistic QDA 17)Please match the non-linear models with the corresponding descriptions: Group of answer choicesPolynomial Regression [ Choose ] compute the fit at a target point using only the nearby training observations. involve dividing the range of X into K distinct regions. Within each region, a polynomial function is fit to the data. These polynomials are constrained so that they join smoothly at the region boundaries or knots. extend the linear model by adding extra predictors, obtained by raising each of the original predictors to a power. extend a standard linear model by allowing non-linear functions of each of the variables, while maintaining additivity. result from minimizing a residual sum of squares criterion subject to a smoothness penalty. Regression Splines [ Choose ] compute the fit at a target point using only the nearby training observations. involve dividing the range of X into K distinct regions. Within each region, a polynomial function is fit to the data. These polynomials are constrained so that they join smoothly at the region boundaries or knots. extend the linear model by adding extra predictors, obtained by raising each of the original predictors to a power. extend a standard linear model by allowing non-linear functions of each of the variables, while maintaining additivity. result from minimizing a residual sum of squares criterion subject to a smoothness penalty. Smoothing Splines [ Choose ] compute the fit at a target point using only the nearby training observations. involve dividing the range of X into K distinct regions. Within each region, a polynomial function is fit to the data. These polynomials are constrained so that they join smoothly at the region boundaries or knots. extend the linear model by adding extra predictors, obtained by raising each of the original predictors to a power. extend a standard linear model by allowing non-linear functions of each of the variables, while maintaining additivity. result from minimizing a residual sum of squares criterion subject to a smoothness penalty. Local Regression [ Choose ] compute the fit at a target point using only the nearby training observations. involve dividing the range of X into K distinct regions. Within each region, a polynomial function is fit to the data. These polynomials are constrained so that they join smoothly at the region boundaries or knots. extend the linear model by adding extra predictors, obtained by raising each of the original predictors to a power. extend a standard linear model by allowing non-linear functions of each of the variables, while maintaining additivity. result from minimizing a residual sum of squares criterion subject to a smoothness penalty. GAM (Generalized Additive Model) [ Choose ] compute the fit at a target point using only the nearby training observations. involve dividing the range of X into K distinct regions. Within each region, a polynomial function is fit to the data. These polynomials are constrained so that they join smoothly at the region boundaries or knots. extend the linear model by adding extra predictors, obtained by raising each of the original predictors to a power. extend a standard linear model by allowing non-linear functions of each of the variables, while maintaining additivity. result from minimizing a residual sum of squares criterion subject to a smoothness penalty. 18)Please fill in each blank by selecting the answer from the Dropdown: a. When the relationship between the DV and IVs is linear, it is usually better to use [ Select ] ["flexible", "inflexible"] statistical learning. b. When the number of observations is very large and the number of predictors is very small, it is usually better to use [ Select ] ["flexible", "inflexible"] statistical learning. c. When the variance of the error term is very large, it is usually better to use [ Select ] ["flexible", "inflexible"] statistical learning. 19) Below are two plots from a simulated data set. The plot on the left includes 4 curves, where the black curve is he simulated true model �. The plot on the right is about how MSE changes as the flexibility of the model increases. Please select the best choice from the dropdowns: Please select the matching color for the corresponding description of the curves in the left hand panel: a) The fit from a simple linear regression is [ Select ] ["blue", "green", "orange"] ; b) The fit from smooth spline with 4 degrees of freedom is [ Select ] ["orange", "blue", "green"] ; c) The fit from smooth spline with 8 degrees of freedom is [ Select ] ["orange", "green", "blue"] ; In the right hand panel, d) [ Select ] ["the dashed line", "the red curve"] is for training MSE; e) [ Select ] ["E", "D", "F", "A", "C", "B"] shows the test MSE for the best fitted curve; f) the dashed line shows [ Select ] ["the training MSE", "the cross-validation MSE", "the minimum possible test MSE"] . 20)A decision tree is a [ Select ] ["parametric", "non-parametric"] [ Select ] ["unsupervised", "supervised"] statistical learning algorithm. It can be utilized for [ Select ] ["both classification and regression tasks", "classification tasks only", "regression tasks only"] . ---- please answer these questions in word doc