2) A variable takes values “Yes” or “No” is [ Select ] [“Numerical”, “Nominal

By admin

2) A variable takes values “Yes” or “No” is [ Select ] [“Numerical”, “Nominal”, “Ordinal”] and
3) The Advertising data set consists of the sales of that product in 200 different markets, along with advertising budgets for the product in each of those markets for three different media: TV, radio, and newspaper. Please select the best choice from the dropdown for each blank.
a. Below is the correlation matrix for TV, radio, newspaper, and sales for the
Advertising data.
[ Select ] [“newspaper”, “radio”, “TV”] has the strongest linear correlation with sales and [ Select ] [“TV”, “radio”, “newspaper”] has the weakest linear correlation with sales.
b. To find the relationship between sales and media advertising budgets, we ran a multiple linear regression and got the following outputs:

– Which media does not significantly contribute to sales when the other media ads are in place? [ Select ] [“newspaper”, “TV and radio”, “none”, “TV”, “radio”]
– Which media with the same spending can generate the biggest boost in sales while holding the others the same? [ Select ] [“TV”, “newspaper”, “radio”, “intercept”]
– How much increase in sales is associated with a given increase in TV advertising while holding the others the same? [ Select ] [“< 0.0001", "0.046", "0.0014", "32.81"] – Which value can best describe how well the model fits the data? [ Select ] ["0.897", "570", "0.0014", "0.8599"] 4)Please fill in each blank by selecting the answer from the Dropdown. (a) For K-Means clustering, the "K" is for [ Select ] ["# of nearest neighbors", "# of observations", "# of folders", "# of clusters"] and it is [ Select ] ["predicted", "pre-selected"] . (b) K-Means clustering is [ Select ] ["supervised learning", "un-supervised learning"] and is [ Select ] ["for a clustering problem", "for a classification problem", "for a regression problem"] . 5) When you run a logistic regression models, which of the following are usually used as measure of performance for model selection? (check all apply) Group of answer choicesR-SquaredStandard Error AIC p-value Test Error Rate Validation Overall Accuracy MSE Adjusted R-Squared 6) Supervised vs. Unsupervised Learning? Fill in the blanks from the dropdowns: The main difference between supervised and unsupervised learning is about [ Select ] ["the training data set is large enough or not", "the model is parametric or non-parametric", "the response variable is categorical or not", "the response variable is labeled or not"] . [ Select ] ["Unsupervised", "Supervised"] learning utilizes labeled inputs and [ Select ] ["test error rate", "simulation", "validation"] can be used to check the performance of the model considering possible overfitting, [ Select ] ["KNN", "", "K-Means"] is an example of supervised learning and [ Select ] ["KNN", "K-Means"] is an example of unsupervised learning. 7) Which of the following do not make explicit assumptions about the true distribution function of the data? Group of answer choicesRegression modelsParametric methods Linear Discriminant Analysis Non-parametric methods 8)KNN (K-Nearest Neighbours) is ____________. (check all apply) Group of answer choicesnon-parametricunsupervised learning a clustering problem parametric 9)Suppose you collected data to study the relationship between basketball shooting and the distance that the player shoot from. Let X be the distance in feet, and Y be the basketball shooting result (1 for a make and 0 for a miss) . You fitted a logistic regression and found the estimated coefficients, �^0= 2, and �^1= -0.2. Based on this model and the estimates, the probability of making a shot from 10 feet of distance is [blank]. (keep 1 decimal place) 10)Which of the following is not true for decision trees: Group of answer choicesTrees can be displayed graphically, and are easily interpreted even by a non-expert.Trees are robust. Trees are very easy to explain to people. Trees can be used both for regression and classification. Trees can easily handle qualitative predictors without the need to create dummy variables. 11) [ Select ] ["Ridge Regression", "PCR", "Lasso", "PLS"] is a dimension reduction method, which first identifies a new feature 12)Bootstrap is a [blank1]. (select all apply) Group of answer choicesresampling method without replacementbagging technique resampling method with replacement validation process classification technique 13)LOOCV stands for Leave-One-Out Cross-Validation. It has a couple of major advantages over the validation set approach. First, it is far less [ Select ] ["error", "bias"] . For a dataset with sample size �, in LOOCV, we repeatedly fit the statistical learning method using training sets that contain [ Select ] ["50%", "70%", "30%", "all", "n-1"] observations. Second, in contrast to the validation approach which will yield different results when applied repeatedly due to [ Select ] ["less observations", "randomness"] in the training/validation set splits, performing LOOCV multiple times will yield the same results. Another cross-validation method we learned in this class is [ Select ] ["best subset selection", "k-Fold CV", "the validation set approach"] . Cross-validation can be used to check model performance for [ Select ] ["regression models only", "classification models only", "clustering models only", "any predictive modeling"] . 14)We run variable selection and regularization procedures to have better [ Select ] ["prediction accuracy", "data set", "calculation"] and model interpretability for [ Select ] ["supervised learning", "both supervised and unsupervised learning", "unsupervised learning"] . We can use [ Select ] ["subset selection", "PCA", "K-Means clustering"] to identify a subset of predictors and use [ Select ] ["shrinkage", "smoothing", "decision tree", "dendrogram"] to fit a linear regression model involving all predictors. In best subset selection, it starts with the null model which contains [ Select ] ["all", "1", "multiple", "0"] predictor(s) and ends with the full model with [ Select ] ["all", "multiple", "1", "0"] predictors. When p = 5 (5 predictors in the full model), the best subset selection algorithm will fit [ Select ] ["10", "5", "32", "20", "2"] models that contain exactly 1 predictor in each model, and pick the model having the smallest [ Select ] ["R-squared", "MSE/RSS"] , or largest [ Select ] ["R-squared", "MSE/RSS"] , among these models with exactly 1 predictor. At the end, it selects a single best model from the p + 1 picked models with smallest cross-validated prediction error measure such as [ Select ] ["Adjusted R-squared/AIC/BIC/Cp", "MSE", "R-squared", "RSS"] 15)Choose the most appropriate statistical technique for each analysis scenario below: Group of answer choicesHow influential are the seven predictor variables(i.e., percent urban population, GDP, birthrate, # of hospital beds, # of doctors, # of radios, and # of telephones) in predicting female life expectancy? [ Choose ] Multiple Regression KNN PCA (Principle Component Analysis) Logistic Regression K-Means To predict if a customer will churn in the next year based on the customer's profile and history of existing customers. [ Choose ] Multiple Regression KNN PCA (Principle Component Analysis) Logistic Regression K-Means To classify a potential voter into various classes like “Will Vote”, “Will not Vote”, “Will Vote to Party D", “Will Vote to Party R". [ Choose ] Multiple Regression KNN PCA (Principle Component Analysis) Logistic Regression K-Means To perform market segmentation by identifying subgroups of people who might be more receptive to a particular form of advertising or more likely to purchase a particular product. [ Choose ] Multiple Regression KNN PCA (Principle Component Analysis) Logistic Regression K-Means We have a large number of correlated predictors, we need to find a smaller new set of representative variables explain most of the variability in the original set. [ Choose ] Multiple Regression KNN PCA (Principle Component Analysis) Logistic Regression K-Means Previous

Exit mobile version