# CS534程序辅导、辅导Java，Python编程

 CS534程序辅导、辅导Java，Python编程 Homework 3 CS534 Machine Learning, Fall 2021 This homework will explore validation and model selection procedures using regularized logistic regression models and nested cross validation. Problem 1 - Cross validation (60 points) This elastic-net regularized logistic regression model is derived by minimiz- ing the negative log likelihood function for samples (xi, gi), gi ∈ {1, 2} max β,β0 `(β, β0) = max β,β0 1 N N∑ i=1 {I(gi?1)log p(xi)+I(gi?2)log (1?p(xi))}?λPα(β), where the regularization penalty Pα(β) = (1?α)12‖β‖22+α‖β‖1 is a function of the free parameters λ, α, and where the indicator function I(0) = 1 and zero elsewhere. The weight λ determines the magnitude of regularization, and the mixing parameter α determines the proportion of the penalty allo- cated to the 1 and 2 norms. The mixing parameter α is often not chosen through cross validation but just set using intuition or experience, where the weight λ is typically determined through cross validation. 1.a. Validation diagram (20 points) Draw a diagram of the cross valida- tion where you indicate which samples are in training, testing, and validation in each of the 5 nested folds. Depict both the inner and outer loops of the nested CV. Indicate clearly how the data is segmented, and the fraction of the data used in each segment. 1 1.b. Model selection (20 points) For each outer CV fold, generate a plot of the classification error on the validation set as a function of the sequence of λ values (five plots total). Use the inner CV folds to calculate standard deviations of the error σE(λ) at each λ, as well as the mean error μE(λ). Choose the optimal lambda λ? as the largest λ within 1-standard devi- ation of the minimum average error λmin = argmin λ μE(λ), λ? = max{λ} subject to λ ≤ μE(λmin) + σE(λmin) In each plot, indicate μE(λ) with a solid black line, and the intervals of variance μE(λ) ± σE(λ) in red, μE(λmin) as a blue point and μE(λ?) as a green point. 1.c. Test error (20 points) Generate a box plot of the five test errors generated by cross-validation. Display the validation errors μE(λ?) in two additional boxplots in the same graph (there are 20 training errors and 20 validation errors). Compare and discuss the errors. 2 Problem 2 - Elastic net logistic regression (40 points) Implement the elastic net regression algorithm using the soft-thresholding and iterative reweighted least-squares approach described in Friedman 2010. This problem can be solved by optimizing the penalized negative log likeli- hood min`(β, β0) = 1 N N∑ i=1 {I(gi? 1)log p(xi) + I(gi? 2)log (1? p(xi))}?λPα(β), where the indicator function I(0) = 1 and zero elsewhere, and p(xi) repre- sents the class probability Pr(G = 1|X = xi) = p(xi) = 1 1 + e?(β0+xTi β) . A quadratic approximation of this likelihood `Q can be used to transform this problem into a weighted least-squares problem with weights wi and response zi (see Equation 10 in Friedman paper) `Q(β, β0) = ? 1 2N N∑ i=1 wi(zi ? β0 ? xTi β)2. where zi = β?0 + x T i β? + yi ? p?(xi) wi , wi = p?(xi)(1? p?(xi)). 2.a. Solution path Implement this algorithm using the mixing parameter α = 0.95 and plot the solution path β(λ) for a sequence of 1000 logarithmically spaced λ in the range λ ∈ [1, 100]. Label each feature j using text on the plot at the point where βj enters the model (when this model coefficient becomes non-zero). 3 Notes  Understanding the approach at a high-level is important. You are going to solve the quadratic approximation repeatedly using equation 10 from the paper, and at each iteration you will recalculate zi, wi to update the quadratic approximation `Q.  You will generate a solution for each value of λ. A fixed number of iterations is the easiest approach for deciding when each problem is solved (50 worked for me).  To accelerate convergence you can use a warm-starting technique. Start with the largest λ which will have the fewest non-zero coeffi- cients. Use the solution of this regularization level as the start for the next smallest lambda, etc.  This problem has some numerical issues that need to be addressed. The weights, wi can shrink to zero and cause NaN values where used in division. The parameters β will continue to grow in an attempt to push the probabilities pi to zero or one. For this reason, you can clamp these to 0, 1 once they are within a reasonable distance (say 1e ? 5). The solutions may also become unstable when the regularization gets too small (this will be apparent in the solution path plot). 请加QQ：99515681 或邮箱：99515681@qq.com   WX：codehelp
免责声明：珠穆朗玛网对于有关本网站的任何内容、信息或广告，不声明或保证其正确性或可靠性，用户自行承担使用网站信息发布内容的真假风险。