Robust Variable Selection and Regularization in Quantile Regression Based on Adaptive-LASSO and Adaptive E-NET

Mudhombo, Innocent; Ranganai, Edmore

doi:10.3390/computation10110203

Open AccessArticle

Robust Variable Selection and Regularization in Quantile Regression Based on Adaptive-LASSO and Adaptive E-NET

by

Innocent Mudhombo

and

Edmore Ranganai

^*

Department of Statistics, University of South Africa, Florida Campus, Private Bag X6, Florida Park, Roodepoort 1710, South Africa

^*

Author to whom correspondence should be addressed.

Computation 2022, 10(11), 203; https://doi.org/10.3390/computation10110203

Submission received: 30 May 2022 / Revised: 28 July 2022 / Accepted: 7 November 2022 / Published: 21 November 2022

(This article belongs to the Section Computational Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Although the variable selection and regularization procedures have been extensively considered in the literature for the quantile regression

(Q R)

scenario via penalization, many such procedures fail to deal with data aberrations in the design space, namely, high leverage points (X-space outliers) and collinearity challenges simultaneously. Some high leverage points referred to as collinearity influential observations tend to adversely alter the eigenstructure of the design matrix by inducing or masking collinearity. Therefore, in the literature, it is recommended that the problems of collinearity and high leverage points should be dealt with simultaneously. In this article, we suggest adaptive

L A S S O

and adaptive E-

N E T

penalized

Q R

(

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

) procedures where the weights are based on a

Q R

estimator as remedies. We extend this methodology to their penalized weighted

Q R

versions of

W Q R

-

L A S S O

,

W Q R

-E-

N E T

procedures we had suggested earlier. In the literature, adaptive weights are based on the RIDGE regression (

R R

) parameter estimator. Although the use of this estimator may be plausible at the

ℓ_{1}

estimator (

Q R

at

τ = 0.5

) for the symmetrical distribution, it may not be so at extreme quantile levels. Therefore, we use a

Q R

-based estimator to derive adaptive weights. We carried out a comparative study of

Q R

-

L A S S O

,

Q R

-E-

N E T

, and the ones we suggest here,

v i z .

,

Q R

-

A L A S S O

,

Q R

-

A E

-

N E T

, weighted

Q R

A L A S S O

penalized and weighted

Q R

adaptive

A E

-

N E T

penalized (

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

) procedures. The simulation study results show that

Q R

-

A L A S S O

,

Q R

-

A E

-

N E T

,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

generally outperform their nonadaptive counterparts. At predictor matrices with collinearity inducing points under normality, the

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

, respectively, outperform the non-adaptive procedures in the unweighted scenarios, as follows: in all 16 cases (100%) with respect to correctly selected (shrunk) zero coefficients; in 88% with respect to correctly fitted models; and in 81% with respect to prediction. In the weighted penalized

W Q R

scenarios,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

outperform their non-adaptive versions as follows: in 75% of the time with respect to both correctly fitted models and correctly shrunk zero coefficients and in 63% with respect to prediction. At predictor matrices with collinearity masking points under normality, the

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

, respectively, outperform the non-adaptive procedures in the unweighted scenarios as follows: in prediction, in

100 %

and

88 %

of the time; with respect to correctly fitted models in

100 %

and

50 %

(while in

50 %

equally); and with respect to correctly shrunk zero coefficients in

100 %

of the time. In the weighted scenario,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

outperform their respective non-adaptive versions as follows; with respect to prediction, both in

63 %

of the time; with respect to correctly fitted models, in

88 %

of the time while with respect to correctly shrunk zero coefficients in

100 %

of the time. At predictor matrices with collinearity inducing points under the t-distribution, the

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

procedures outperform their respective non-adaptive procedures in the unweighted scenarios as follows: in prediction, in

100 %

and

75 %

of the time; with respect to correctly fitted models

88 %

of the time each; and with respect to correctly shrunk zero

88 %

and in

100 %

of the time. Additionally, the procedures

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

and their unweighted versions result in the former outperforming the latter in all respective cases with respect to prediction whilst there is no clear "winner" with respect to the other two measures. Overall, the

W Q R

-

A L A S S O

generally outperforms all other models with respect to all measures. At the predictor matrix with collinearity-masking points under the t-distribution, all adaptive versions outperformed their respective non-adaptive versions with respect to all metrics. In the unweighted scenarios, the

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

dominate their non-adaptive versions as follows: in prediction, in

63 %

and

75 %

of the time; with respect to correctly fitted models, in

100 %

and

38 %

(while in

62 %

equally); in

100 %

of the time with respect to correctly shrunk zero coefficients. In the weighted scenarios, all adaptive versions outperformed their non-adaptive versions as follows:

62 %

of the time in both respective cases with respect to prediction while it is vice-versa with respect to correctly fitted models and with respect to correctly shrunk zero coefficients. In the weighted scenarios,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

dominate their respective non-adaptive versions as follows; with respect to correctly fitted models, in

62 %

of the time while with respect to correctly shrunk zero coefficients in

100 %

of the time in both cases. At the design matrix with both collinearity and high leverage points under the heavy-tailed distributions (t-distributions with

d \in (1; 6)

degrees of freedom) scenarios, the dominance of the adaptive procedures over the non-adaptive ones is again evident. In the unweighted scenarios, the procedures

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

outperform their non-adaptive versions as follows; in prediction, in

75 %

and

62 %

of the time; with respect to correctly fitted models, they perform better in

100 %

and

88 %

of the time, while with respect to correctly shrunk zero coefficients, they outperform their non-adaptive ones

100 %

of the time in both cases. In the weighted scenarios,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

dominate their non-adaptive versions as follows; with respect to prediction, in

100 %

of the time in both cases; and with respect to both correctly fitted models and correctly shrunk zero coefficients, they both do

88 %

of the time. Results from applications of the suggested procedures to real life data sets are more or less in line with the simulation studies results.

Keywords:

weighted quantile regression; adaptive LASSO penalty; penalty; adaptive E-NET penalty; collinearity inducing point; collinearity hiding point; collinearity influential points

1. Introduction

Variable selection is at the heart of the model-building process. However, it is fraught with challenges. These challenges stem from data aberrations in both the X-space and Y-space. X-space data aberrations comprise extreme observations referred to as high leverage points, while Y-space extreme ones are residual outliers referred to as outliers. High leverage points can be collinearity hiding or inducing points, referred to as collinearity influential observations [1]. The least squares

(L S)

method is susceptible to both X-space and Y-space data aberrations (non-Gaussian error terms), hence the need for robust procedures. In the literature, the least absolute deviation

(L A D)

-based [2] procedures have been suggested as alternatives to the

L S

as they are robust in the Y-space. One such

L A D

-based procedure is quantile regression (

Q R

) [3]. However,

L A D

-based procedures are amenable to high leverage points.

Q R

, which generalizes the

L A D

estimator to all quantile levels, is a very attractive procedure, as it provides more information about the conditional distribution of Y given X. As a result, it has generated more research interest in recent years, hence the need for robust variable selection procedures in the

Q R

framework.

Subset selection has been a topical issue since the 19th century. However, subset selection procedures tend to be unstable and especially unsuitable for variable selection when the dimension is high [4]. As a result, towards the end of the 20th century, the literature witnessed a fair proliferation of penalization (shrinkage) procedures as the alternative variable/model selection and regularization tools. Penalization procedures tend to be fairly stable and produce lower prediction errors than subset selection procedures. So they have been suggested to proffer solutions to shortcomings of subset selection methods, both in

L S

and

Q R

scenarios, with varying degrees of success. These methods include procedures such as

R I D G E

[5],

L A S S O

[6], elastic net (E-

N E T

) [7], the nonnegative garotte selection [4,8] and their extended versions, just to mention a few.

Although the

L A S S O

penalty has the edge over the

R I D G E

penalty against overfitting (as the

R I D G E

penalty does not shrink the coefficient estimates to zero) with attractive prediction properties, its model selection criterion is only consistent under some restrictive assumptions [9]. Other drawbacks of the

L A S S O

are that in the

p > n

case, where p and n are the numbers of predictors and observations, respectively, the

L A S S O

selects at most n variables before it saturates; in the case where a group of variables among which the pairwise correlations are very high, then the

L A S S O

tends to select only one variable from the group randomly; for the usual

n > p

case with high correlated predictors, it has been empirically observed that the prediction performance of the

L A S S O

is dominated by

R I D G E

regression [7]. Zou and Hastie [7] suggested the E-

N E T

procedure with similar sparsity of representation to the

L A S S O

often outperforming the

L A S S O

procedure. The competitive edge of the E-

N E T

over the

L A S S O

is that it encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together; and is particularly useful when p is much bigger than n.

The

L A S S O

penalty performs poorly in the presence of high leverage points (especially collinearity influential points), and procedures such as smoothly clipped absolute deviation

(S C A D)

[10] and minimax concave penalty

(M C P)

[11] are better options. Weights-based regression methods to remedy the influence of both high leverage points and outliers have been suggested in the literature. In the LS scenario, weighted regression has been used in variable selection, for example, doubly adaptive penalized regression, which satisfies both sparse and robustness properties [12]. In the

Q R

scenario, weighted

Q R

(W Q R)

has been suggested as a remedy to high leverage points [13]. Building upon this idea, Ranganai and Mudhombo [14] suggested a weighted

Q R

L A S S O

(

W Q R

-

L A S S O

) variable selection and regularization parameter estimation procedure, which is robust to both the X and Y-spaces data aberrations.

Since

L A S S O

penalizes the parameter coefficient estimates equally and is only consistent under some restrictive assumptions, a dynamic variable selection and regularization approach has been suggested via the adaptive

L A S S O

(

A L A S S O

) penalty. Compared to

L A S S O

,

A L A S S O

enjoys the oracle properties which guarantee optimal performance in large samples, with high dimensionality [15] as well as the computational advantage of the

L A S S O

owing to the efficient path algorithm [16]. The

A L A S S O

procedure relies on suitably chosen weights.

Frommlet and Nuel [17] suggested the adaptive

R I D G E

(A R I D G E)

procedure for variable selection with interesting formulations of the adaptive weight. In the

Q R

scenario, variable selection has been suggested via

A L A S S O

Q R

(

Q R

-

A L A S S O

) [18]. Zou and Zhang [19] suggested the adaptive E-

N E T

(AE-

N E T

) procedure that inherits some good properties from both

A L A S S O

and

A R I D G E

penalty-based procedures.

This article suggests adaptive

L A S S O

and adaptive E-

N E T

penalized weighted

Q R

(

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

) procedures with the initial coefficient estimator used to compute adaptive weights need not be consistent. This coefficient estimate is computed from

W Q R

in which the weights used to downweigh high leverage points are based on the minimum covariance determinant

(M C D)

estimator of Rousseeuw [20]. The

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

procedures are extensions of Ranganai and Mudhombo [14]’s

W Q R

-

L A S S O

and

W Q R

-E-

N E T

procedures as mitigation against both high leverage and collinearity influential observations.

In summary, the motivations of this study are premised on the following:

Rather than carrying an “omnibus” study of adaptive penalized QR, we carry out a detailed study by distinguishing different types of high leverage points under different distribution scenarios $v i z .$
–
Collinearity influential points that comprise collinearity inducing and collinearity masking ones.
–
A “mixture” of collinearity and high leverage points that are not collinearity influential.
Unlike the conditional mean regression $(L S)$ estimator, which is a global one, the regression quantile $(R Q)$ estimator is a local one. Therefore, we suggest a $Q R$ -based estimator instead of ( $R R$ ) parameter-based estimator suggested in the literature to derive adaptive weights in extending $Q R$ - $L A S S O$ and $Q R$ -E- $N E T$ procedures to $Q R$ - $A L A S S O$ and $Q R$ - $A E$ - $N E T$ procedures, respectively.
We further extend $Q R$ - $A L A S S O$ and $Q R$ - $A E$ - $N E T$ procedures to $W Q R$ - $A L A S S O$ and $W Q R$ - $A E$ - $N E T$ procedures using the same methodology.
We carry a comparative study of these models using simulation studies and well-known data sets from the literature.

The rest of the paper is organized as follows: In Section 2.1, a brief overview of

Q R

, the adaptive weights and the

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

procedures are given. The

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

procedures are extended to their

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

counterparts, respectively, in Section 2.2. Simulation results and examples are presented in Section 3. The simulations consider collinearity influential observations in the design matrix as well as normal and t-distributed error term scenarios with varying degrees of tail heaviness. The examples consider data sets from the literature with collinearity influential observations. Finally, concluding remarks are provided in Section 4.

2. Quantile Regression

Consider the linear regression model given by

y_{i} = x_{i}^{'} β + ϵ_{i}, i \in [1 : n],

(1)

where

y_{i}

is the ith response random variable,

x_{i}^{'}

, the ith row of design matrix

X \in R^{n \times p}

,

β

is vector of parameters and

ϵ_{i} \sim F

, the ith error term. The

Q R

estimator [3] for the coefficient vector

β \in R^{p}

in Equation (1), is based on an optimization problem solved via linear programming techniques. The

Q R

minimization problem for estimating the parameter vector is given by

\hat{β} (τ) = a r g m i n_{β \in R^{p}} \sum_{i = 1}^{n} ρ_{τ} (y_{i} - x_{i}^{'} β (τ)), i \in [1 : n],

(2)

with the check function,

ρ_{τ} (u) = \{\begin{matrix} τ . u, i f u \geq 0 \\ (τ - 1) . u, i f u < 0 \end{matrix} i \in [1 : n]

denoting the re-weighting function of residuals

e_{i}

s for

0 < τ < 1

, where

u = y_{i} - x_{i}^{'} β (τ)

denotes residuals and

\hat{β} (τ)

the

τ

th

R Q

.

2.1. Variable Selection in Quantile Regression

In this section, we give an overview of

Q R

variable selection using penalized procedures. We discuss

Q R

-penalized methods with

L A S S O

[6] and E-

N E T

[7] based penalties as well as the

A L A S S O

[16] and

A E

-

N E T

[19] based penalties.

The

L A S S O

-penalized

Q R

procedure, denoted by

Q R

-

L A S S O

, is a

Q R

variable selection procedure that uses a

L A S S O

(

ℓ_{1}

) penalty. This penalized

Q R

procedure is given by the minimization problem

\hat{β} (τ) = a r g m i n_{β \in R^{p}} \sum_{i = 1}^{n} ρ_{τ} | y_{i} - x_{i}^{'} β (τ) | + n λ \sum_{j = 1}^{p} | β_{j} |, j \in [1 : p], i \in [1 : n],

(3)

where the tuning parameter

λ

shrinks the

β

coefficients towards zero equally. The second term represents the penalty term and other terms are explained in Equations (1) and (2). The continuous shrinkage by

Q R

-

L A S S O

procedure frequently enhances prediction precision due to bias–variance trade-off.

The

Q R

variable selection procedure that uses an

A L A S S O

penalty is denoted by

Q R

-

A L A S S O

. The adaptive weights penalize different coefficients differently in the

Q R

setting, thereby outperforming the

L A S S O

penalty scenario. The

Q R

-

A L A S S O

procedure is an extension of the

L A S S O

penalty-based method. The construction of adaptive weights is discussed in Section 2.1.1. The

Q R

-

A L A S S O

procedure selects variables by solving a minimization problem:

\hat{β} (τ) = a r g m i n_{β \in R^{p}} \sum_{i = 1}^{n} ρ_{τ} | y_{i} - x_{i}^{'} β (τ) | + λ \sum_{j = 1}^{p} {\tilde{ω}}_{j} | β_{j} |, f o r j \in [1 : p], i \in [1 : n],

(4)

where

{\tilde{ω}}_{j}

is the adaptive weight and the tuning parameter

λ_{j} = {\tilde{ω}}_{j} λ

, shrinks predictors coefficients to zero differently. Other terms are as defined in Equations (1) and (2). The

Q R

-

A L A S S O

tuning parameter is no-longer constant (

λ

), but

λ_{j}

for

j = 1, 2, \dots, p

, which is varying.

We present the penalized

Q R

procedure by [7]’s E-

N E T

penalty (

Q R

-E-

N E T

). This penalized

Q R

procedure comprises a combination of the

L A S S O

and

R I D G E

penalties. The E-

N E T

-penalized

Q R

is best applicable to unidentified groups of predictors. The E-

N E T

-penalized

Q R

is given by the minimization problem

\hat{β} (τ) = a r g m i n_{β \in R^{p}} \sum_{i = 1}^{n} ρ_{τ} | y_{i} - x_{i}^{'} β (τ) | + α λ \sum_{j = 1}^{p} | β_{j} | + (1 - α) λ \sum_{j = 1}^{p} β_{j}^{2}, j \in [1 : p], i \in [1 : n],

(5)

where

α \in [0, 1]

is the mixing parameter between

R I D G E

(

α = 0

) and

L A S S O

(

α = 1

), and

λ

is the tuning parameter for the second and third terms. This procedure outperforms its

R I D G E

and

L A S S O

counterparts [7].

In a similar fashion to the extension of

Q R

-

L A S S O

(Equation (3)) to

Q R

-

A L A S S O

(Equation (4)),

Q R

-E-

N E T

(Equation (5)) can be extended to

Q R

-

A E

-

N E T

(Equation (6)) as in [19]. Suppose the adaptive weight (

{\tilde{ω}}_{j}

) is constructed as in Equation (8). We define the

Q R

-

A E

-

N E T

estimator

\hat{β}

as

\hat{β} (τ) = a r g m i n_{β \in R^{p}} \sum_{i = 1}^{n} ρ_{τ} | y_{i} - x_{i}^{'} β (τ) | + α λ \sum_{j = 1}^{p} {\tilde{ω}}_{j} | β_{j} | + (1 - α) λ \sum_{j = 1}^{p} {\tilde{ω}}_{j} β_{j}^{2}, j \in [1 : p], i \in [1 : n],

(6)

where the terms are defined as in Equations (4) and (5).

Q R

-

A E

-

N E T

reduces to

Q R

-

A L A S S O

for

α = 1

and

Q R

-

A R I D G E

for

α = 0

.

Q R

-

A E

-

N E T

inherits the desired optimal minimax bound from

A L A S S O

[16]. It is further anticipated that the procedure deals with collinearity challenges better due to the presence of the

ℓ_{2}

-penalty.

2.1.1. Choice of Adaptive Weights for $A L A S S O$

Consider the penalty construction

\sum_{j = 1}^{p} {| {\hat{β}}_{j} |}^{q}

as a general penalty term in most penalization problems. The adaptive penalty can be expressed as

\sum_{j = 1}^{p} ω_{j} {| {\hat{β}}_{j} |}^{q}

, where

ω_{j}

s are the adaptive weights. When

q = 1

, we obtain an

A L A S S O

penalty and for

q = 2

, we obtain an adaptive

R I D G E

(

A R I D G E

) penalty. [16] suggested using the

L S

β

estimator in determining adaptive weights,

ω_{j} s

. The

L S

\hat{β}

is suitable in the absence of collinearity. In the presence of collinearity, Ref. [16] suggested the use of

R I D G E

\hat{β}

as a suitable replacement because it is superior in stability than its

L S

counterpart. In constructing the adaptive weights

(ω_{j})

in the

L S

case, we suggest using the

M C D

-based weighted

R I D G E

regression

(W R R)

estimator

{\hat{β}}_{W R R} = a r g m i n_{β \in R^{p}} \sum_{i = 1}^{n} {\overset{ˇ}{ω}}_{i} {(y_{i} - x_{i}^{'} β)}^{2} + n λ \sum_{j = 1}^{p} β_{j}^{2}, j \in [1 : p], i \in [1 : n],

(7)

where

λ

is the tuning parameter,

β

is a vector of parameters and

{\overset{ˇ}{ω}}_{i}

is a robust

M C D

-based weight [14]. Thus, in the

L S

case the weights are given by

ω_{j} = {(| {\hat{β}}_{W R R_{j}} | + 1 / n)}^{- γ},

(8)

where

{\hat{β}}_{W R R_{j}}

is jth parameter estimator from a

W R R

penalized solution and

1 / n

is added to avoid dividing by zero. This adaptive weight is a special case of weight,

ω = (| {\hat{β}}_{R R_{j}} {|^{γ} + δ^{γ})}^{(θ - 2) / γ}

proposed by [17], where

θ = 1

,

δ = 1 / n

and

γ = 1

.

Although the use of

ω_{j}

may be applicable to the

ℓ_{1}

estimator (

R Q

at

τ = 0.5

) for the symmetrical distribution, it may not be applicable at extreme quantile levels in the presence of high leverage (and collinearity influential) points. This is due to the fact that the presence of these atypical observations result in

R Q

planes often crossing (unequal slope parameter estimates) (see [21,22]), although they are theoretically parallel. Therefore, instead of using

{\hat{β}}_{W R R_{j}}

, we suggest using the one based on

W Q R

,

\hat{β} {(τ)}_{W Q R_{j}}

, which is the solution to the minimization problem

\hat{β} {(τ)}_{W Q R} = a r g m i n_{β \in R^{p}} \sum_{i = 1}^{n} {\overset{ˇ}{ω}}_{i} ρ_{τ} | y_{i} - x_{i}^{'} β (τ) | + n λ \sum_{j = 1}^{p} β_{j}^{2}, j \in [1 : p], i \in [1 : n],

(9)

where

\hat{β} {(τ)}_{W Q R}

is the

τ^{t h}

W Q R

-

R I D G E

estimator, and

λ

and

{\overset{ˇ}{ω}}_{i}

are as defined in Equations (2) and (7). An analogous result to Equation 8

(ω_{j})

based on using the

W Q R

-

R I D G E

parameter estimates

\hat{β} {(τ)}_{W Q R}

becomes

{\tilde{ω}}_{j} = {(| \hat{β} {(τ)}_{W Q R_{j}} | + 1 / n)}^{- 1},

(10)

where

\hat{β} {(τ)}_{W Q R_{j}}

is the jth parameter estimator from a

W Q R

-

R I D G E

penalized solution at a specified

τ

quantile level, and other terms are defined in Equation (8). Therefore, for all variable selection in simulations and applications, we use the weights

{\tilde{ω}}_{j}

for

W Q R

instead of

ω_{j}

for

W R R

(

{\tilde{ω}}_{j}

is same as

ω_{j}

at

τ = 0.50

R Q

level).

\hat{β} {(τ)}_{W Q R_{j}}

is more representative at all

τ

quantiles than

{\hat{β}}_{W R R_{j}}

. To our knowledge, no such adaptive weight has been applied in a

Q R

variable selection scenario. The adaptive weight has the advantage of not being influenced by extreme points and adjustable to particular distribution levels, for example,

t_{1}

,

t_{2}

distributions,

e t

c e t e r a

and applicable at all

τ

quantile levels.

2.2. Adaptive Penalized Weighted Quantile Regression

Using the adaptive weights

{\tilde{ω}}_{i}

and the

M C D

-based weights

{\overset{ˇ}{ω}}_{i}

, we propose an adaptive penalized

W Q R

variable selection procedure. The

W Q R

penalization problems culminate from

A L A S S O

and

A E

-

N E T

penalties, namely the

W Q R

adaptive

L A S S O

(

W Q R

-

A L A S S O

) and the

W Q R

adaptive E-

N E T

(

W Q R

-

A E

-

N E T

) estimators, respectively. The approach is robust in the presence of high leverage points (and collinearity influential observations) due to robustly chosen

M C D

-based weights as in the

W Q R

(see [14,23,24]).

We first write the proposed

W Q R

-

A L A S S O

given by the minimization problem

\hat{β} (τ) = a r g m i n_{β \in R^{p}} \sum_{i = 1}^{n} {\overset{ˇ}{ω}}_{i} ρ_{τ} | y_{i} - x_{i}^{'} β (τ) | + λ \sum_{j = 1}^{p} {\tilde{ω}}_{j} | β_{j} |, f o r j \in [1 : p], i \in [1 : n],

(11)

where

{\tilde{ω}}_{j}

s are

W Q R

-

R I D G E

parameter estimator-based adaptive weights,

{\overset{ˇ}{ω}}_{i}

s are

M C D

-based weights and, other terms are as defined in Equations (2) and (3). The proposed adaptive penalized

Q R

procedure,

W Q R

-

A E

-

N E T

is the minimization problem

\hat{β} (τ) = a r g m i n_{β \in R^{p}} \sum_{i = 1}^{n} {\overset{ˇ}{ω}}_{i} ρ_{τ} | y_{i} - x_{i}^{'} β (τ) | + α λ \sum_{j = 1}^{p} {\tilde{ω}}_{j} | β_{j} | + (1 - α) λ \sum_{j = 1}^{p} {\tilde{ω}}_{j} β_{j}^{2}, j \in [1 : p], i \in [1 : n],

(12)

where the terms are defined in Equations (2), (9) and (11). Special cases of this construction are

W Q R

-

A L A S S O

if

α = 1

and

W Q R

-

A R I D G E

if

α = 0

. Just like its unweighted

Q R

counterpart,

W Q R

-

A E

-

N E T

inherits the desired optimal minimax bound from

A L A S S O

[16]. The next section discusses the asymptotic properties of our proposed procedures.

2.3. Asymptotic Properties

In this section, we establish the oracle properties of the adaptive LASSO penalized

Q R

. Consider the linear model in Equation 1 and that

P (ϵ_{i} < 0) = τ

. Let us set

X_{1} = (x_{1}, \dots, x_{p_{0}}) \in R^{p_{0}}

and set

X_{2} = (x_{p_{0} + 1}, \dots, x_{p}) \in R^{p - p_{0}}

(noise variables) such that

β = {(β_{1}^{'}, β_{2}^{'})}^{'}

, where the true regression coefficient vector

β_{1}

corresponds to non-zero coefficients and

β_{2} = 0

.

For asymptotic normality, the following theoretical results are assumed to be true for a suitable choice of

λ_{n}

[18]:

(i): The regression errors $ϵ_{i} s$ , are $i . i . d .$ , with $τ$ th quantile zero and a continuous, positive density $f (.)$ in a neighborhood of zero and F distributed [25]. NOTE: $F (0) = τ$ and $| f (y) - f (0) | \leq {c | y |}^{1 / 2}$ , $\forall y$ in the neighborhood of 0 and real quantile $τ \in (0, 1)$ .
(ii): Let $Ω = d i a g (ω_{1}, ω_{2}, \dots, ω_{n})$ , where $ω_{i}$ for $i = 1, 2, \dots, n$ are known positive values that satisfy $m a x {ω_{i}} = O (1)$ [14].
(iii): Consider the design $X$ , such that there exists a positive definite matrix ∑, where $\sum = l i m_{n \to \infty} X^{*}^{'} X^{*} / n$ with

$X^{*}^{'} X^{*} = X^{'} Ω X = [\begin{matrix} \sum_{11} & \sum_{12} \\ \sum_{21} & \sum_{22} \end{matrix}],$

where $\sum_{11} \in R^{p_{0}}$ and $\sum_{22} \in R^{p - p_{0}}$ .

Let Equation 6 have

β^{A L A S S O}

as its solution. We now state a theorem for an asymptotic oracle property for

i . i . d .

random error terms (Theorem 1).

Theorem 1.

Consider a sample

{(x_{i}, y_{i}), i = 1, \dots, n}

from Equation (6) satisfying Conditions

(i)

and

(i i i)

with

Ω = I_{n}

(constant weight of 1). If

\sqrt{n} λ_{n} \to 0

and

n^{(γ + 1) / 2} λ_{n} \to \infty

, then we have

Sparsity: ${\hat{β}}_{2}^{A L A S S O} = 0$ ;
$\sqrt{n} ({\hat{β}}_{1}^{A L A S S O} - β_{1}^{A L A S S O})$ converges asymptotically in limit to $N (0, \frac{τ (1 - τ) \sum_{11}^{- 1}}{f {(0)}^{2}})$ .

Now, consider the extension of oracle results to non-

i . i . d .

random error scenarios. The following assumptions are considered.

(iv): As $n \to \infty$ , $m a x_{i < i < n} {x_{i}^{'} x_{i} / n} \to 0$ .
(v): The random errors $ϵ_{i} s$ are independent with $F_{i} (t) = P (ϵ_{i} \leq t)$ the distribution function of $ϵ_{i}$ . We assume that each $f_{i} (.)$ is locally linear near zero (with a positive slope) and $F_{i} (0) = τ$ . Define $ψ_{n i} (t) = \int_{0}^{t} \sqrt{n} (F_{i} (s / \sqrt{n}) - F_{i} (0)) d s$ , which is a convex function for each n and i.
(vi): Assume that, for each $u$ , $(1 / n) \sum_{i = 1}^{n} ψ_{n i} (u^{'} x) \to ς (u)$ , where $ς (.)$ is a strictly convex function taking values in $[0, \infty)$ .

Corollary 1.

Under Conditions

(i i)

,

(i i i)

and

(i v)

, Theorem 1 holds, provided the non-

i . i . d .

random errors satisfy

(v)

and

(v i)

(see also [26]).

Remark 1.

The

Q R

model, a non-

i . i . d .

random error model, is catered for by

(v)

[27]. See the proofs online [18].

3. Simulation Study

In this section, we discuss the simulation results of four adaptive variable selection and estimation procedures, namely

Q R

-

A L A S S O

,

Q R

-

A E

-

N E T

(for

α = 0.5

),

W Q R

-

A L A S S O

, and

W Q R

-

A E

-

N E T

(for

α = 0.5

) against their non-adaptive versions. We evaluate and illustrate these four procedures’ variable selection and prediction performance under normality and t−

d i s t r i b u t i o n s

(heavy-tailed error distributions) in the presence of collinearity influential and high leverage observations in the design space. The pairwise comparisons are two-pronged,

v i z .,

(i)

comparison of adaptive (

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

) versus non-adaptive (

Q R

-

L A S S O

and

Q R

-E-

N E T

) procedures and

(i i)

comparison of weighted adaptive (

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

) procedures versus weighted non-adaptive (

W Q R

-

L A S S O

,

W Q R

-E-

N E T

) procedures. Cross comparisons are explored between

(i)

and

(i i)

scenarios. All simulations are applied at two

Q R

levels,

τ \in (0.25; 0.50)

and two sample sizes,

n \in (50; 100)

, although

n = 100

is left for brevity out in the presentation of the results for brevity.

3.1. Simulation Design Scenarios

In the design space, we consider four collinearity influential point scenarios in addition to the orthogonal design matrix and a correlated design matrix with high leverage points, while in the error term distribution scenarios, we consider the normal and heavy tailed (t−distribution cases at different degrees of freedom) distribution cases. The design matrices choices are simulated as in Jongh et al. [14,28,29], viz.

(i): $D 1$ —the well-behaved orthogonalized $n \times p$ design matrix $X$ (where the initial unorthogonalized’s $p = 8$ columns are generated from $N (0, 1)$ ) satisfy the condition $X^{'} X = n I$ . We first generate the $n \times p$ response data, $W$ , where $w_{i j}$ ∼ $N (0, 1)$ with $i = 1, 2, \dots, n$ and $j = 1, 2, \dots, p$ . We find the singular value decomposition $(S V D)$ of the design matrix $W$ , given by $W = U D V^{'}$ , where $U$ and $V$ are orthogonal with the diagonal entries of $D$ . The diagonal entries of $D$ are the eigenvalues of the design matrix $W$ . Finally, the design matrix $X$ is given by $X = \sqrt{n} U$ such that $X^{'} X = n I$ since $U$ is orthogonal. We use the design matrix $D 1$ as a baseline when comparing with scenarios $D 2$ – $D 5$ .
(ii): $D 2$ / $D 4$ —the design matrix $D 1$ with the most extreme point by Euclidean distance moved 10 units in the X direction ( $D 2$ ) and 100 units in the X direction $(D 4)$ . The resultant extreme points are collinearity inducing points for scenarios $D 2$ and $D 4$ (see Figure 1).
(iii): $D 3$ / $D 5$ —the design matrix $D 1$ with the most and second most extreme points by Euclidean distance moved 10 and 100 units, respectively, in the X direction. The two extreme points have a masking effect on collinearity (see scenarios $D 3$ and $D 5$ in Figure 1).
$D 6$ —a correlated design matrix case with high leverage points [14,30]. In $D 6$ , we partitioned the $n \times p$ design matrix, $X = {(X_{1}^{'}, X_{2}^{'})}^{'}$ , where the uncontaminated part, $X_{1} \sim N (0, V)$ and the contaminated $m \times p$ sub-matrix $X_{2} \sim N (1, I)$ ( $1$ is the mean vector of ones and $I$ is an identity matrix). The exponential decay $0 . 5^{∣ j - i ∣}$ , for $i = 1, 2, 3, 4, 5, 6, 7, 8$ , $j = 1, 2, 3, 4, 5, 6, 7, 8$ generates the $(i j)$ th entry of covariance matrix $V$ and $0$ is the mean vector of zeros. The design matrix $D 6$ has $m = (5; 10)$ contamination points (using contamination rate of $10 %$ ) from $n = (50; 100)$ observations.

In the Gaussian error term case, for n observations and p predictor variables, the dependent variable is determined by

Y_{i} = X_{i}^{'} β + σ e_{i}, f o r i \in [1 : n] .

(13)

where

σ \in (1; 3)

, is standard deviation,

e_{i} \sim N (μ, σ^{2})

is the error term,

β = (3, 1.5, 0, 0, 2, 0, 0, 0)

and other terms are defined in Equation 1 (

D 1

–

D 5

). The dependent variable in heavy tailed distributions is determined by Equation 1 with standard error

σ \in (0.5; 1)

and

d \in (1; 6)

degrees of freedom (

e_{i} \sim t_{d}

) (

D 6

).

The number of simulation runs in each of the

D 1

-

D 6

scenarios is 200, and 10-fold cross-validation

(C V)

is applied to select the best tuning parameter (optimal

λ

) in the test data.

Different scenarios of the

A L A S S O

or

A E

-

N E T

penalized

Q R

s (weighted and unweighted), are explored using hqreg package, a readily available R-add-on-package [31]. The hqreg package, a semi-smooth Newton coordinate descent algorithm, chooses optimal

λ

as in Ranganai and Mudhombo [14].

3.2. Results

To study the robustness and performance of the different

A L A S S O

and

A E

-

N E T

-penalized

Q R

procedures in the presence of collinearity influential points, high leverage points and influential points under different distributions, we present the simulated results. These results are based on the metrics; the median of the test errors,

\begin{matrix} m e d i a n \\ 1 \leq i \leq n \end{matrix} {ϵ_{i}}

and its respective measure of dispersion

M A D

of the test errors, percentage

(%)

correctly fitted, average of correct/incorrect zero coefficients. The

M A D

of test errors for each penalized estimator is estimated by

M A D = 1.4826 (\begin{matrix} m e d i a n \\ 1 \leq i \leq n \end{matrix} ∣ ϵ_{i} - \begin{matrix} m e d i a n \\ 1 \leq i \leq n \end{matrix} {ϵ_{i}} ∣)

. We compare the performance of non-adaptive procedures (

Q R

-

L A S S O

and

Q R

-E-

N E T

) and the adaptive ones we suggest here based on a

Q R

adaptive weight,

v i z .

,

Q R

-

A L A S S O

,

Q R

-

A E

-

N E T

,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

procedures. Without loss of generality, we only consider all procedures at

τ = 0.25

and

τ = 0.50

Q R

levels.

Remark 2.

The five zero coefficients correspond to the set {

β_{3}

,

β_{4}

,

β_{6}

,

β_{7}

, and

β_{8}

}, hence the maximum average of correctly/incorrectly selected (shrunk) coefficients is 5, while the set of correctly selected models is given as a proportion, i.e., a percentage.

3.2.1. D1 under the t-Distribution

Results at the design matrix

D 1

(the orthogonal design scenario or baseline case) under normality and heavy-tailed t-distributed error scenarios with degrees of freedom

d = 1

are shown in Table 1. In all 16 pairwise cases except one,

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

procedures outperform their respective non-adaptive penalized procedures at

τ \in (0.25; 0.50)

Q R

levels with respect to all measures except one. The exception is the

L A S S O

-penalized procedure, which outperforms all adaptive penalized procedures in the heavy-tailed distribution scenario (t-distribution case) when

d = 1

,

σ = 1

and

τ = 0.50

. The

Q R

-

A L A S S O

outperforms the

Q R

-

A E

-

N E T

procedure in both the Gaussian and t-distribution cases in this baseline scenario with respect to

M A D

of the Test Errors. Generally, the adaptive penalized procedures,

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

correctly shrinks covariate parameters (

β_{j}

:

j = 3, 4, 6, 7, 8

) to zero with high precision (in all cases) compared to their respective non-adaptive penalized procedures. However, the performance with respect to all measures decreases as the error term distribution becomes heavier.

3.2.2. D2 and D4 under Normal Distribution

Results at the design matrices with collinearity inducing high leverage points (

D 2

and

D 4

) are shown in Table 2 and Table A1. In the unweighted penalized

Q R

case, the adaptive penalized procedures outperform the non-adaptive procedures in all scenarios in all 16 (100%) respective pairwise comparisons with respect to correctly selected (shrunk) zero coefficients. However, with respect to the percentage of correctly fitted models, 14 (88%) adaptive penalized procedures outperform their respective non-adaptive procedures and 13 (81%) adaptive penalized procedures outperform their respective non-adaptive procedures with respect to prediction. In the weighted penalized

W Q R

scenarios,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

, 10 (63%) adaptive penalized procedures outperform their respective non-adaptive procedures with respect to prediction and 12 (75%) with respect to both the percentage of correctly fitted models and correctly shrunk zero coefficients. The results at

τ = 0.50

and at

τ = 0.25

R Q

levels are similar in

D 4

(see Table A1 for results at

τ = 0.50

).

Again, the performance with respect to all measures decreases as the error term distribution becomes heavier, i.e., as

σ

increases, the effectiveness of shrinking zero coefficients correctly tends to be compromised. The pairwise comparisons between the weighted versions of penalized procedures (

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

) and their respective non-weighted versions (

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

) result in the former outperforming the latter in all cases with respect to prediction while doing so in the majority of cases with respect to the other two measures. Overall, the

W Q R

-

A L A S S O

generally outperforms all the other models with respect to all measures.

3.2.3. D3 and D5 under the Normal Distribution

Results in Table 3 and Table A2 show variable/model selection and prediction performance in the presence of collinearity hiding points (

D 3

and

D 5

) under the Gaussian distribution. In the unweighted scenarios, the 16 pairwise comparisons demonstrate that the adaptive versions of penalized procedures (

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

) dominate their non-adaptive versions as follows; in prediction, they perform better

100 %

and

88 %

of the time, respectively; with respect to correctly fitted models they perform better

100 %

and

50 %

(while in

50 %

of the time they perform equally) of the time, respectively; and with respect to correctly shrinking zero coefficients, they both perform better

100 %

of the time, respectively. The performance picture in the weighted scenario is as follows; with respect to prediction, both

W Q R

-

L A S S O

and

W Q R

-E-

N E T

outperform their respective adaptive versions

63 %

of the time; with respect to correctly fitted models, the adaptive versions

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

outperform their non-adaptive ones

88 %

of the time, while with respect to correctly shrinking zero coefficients they outperform their non-adaptive ones

100 %

of the time.

Thus, the adaptive weights improve the performance of the models with respect to all measures in the unweighted scenario while doing so with respect to correctly fitted models and correctly shrinking zero coefficients in the unweighted scenario. However, in the weighted scenario, the adaptive weights hamper the performance of the models with respect to prediction. Although the

τ = 0.25

at

D 5

scenario is not included in the tables, we obtain similar results as at

τ = 0.50

(see Table A2).

3.2.4. D2 and D3 under the t-Distribution

The heavy-tailed t-distribution scenario results for

D 2

and

D 3

with collinearity hiding and inducing observations, respectively, are shown in Table 4, Table 5, Table A3 and Table A4; respectively. For brevity, the results of the heavy-tailed

D 4

and

D 5

scenarios are not reported since they are similar to those of

D 2

and

D 3

.

At the predictor matrix with collinearity-inducing points

(D 2)

, under the t-distribution, all adaptive versions outperformed their non-adaptive versions with respect to all metrics. In the unweighted scenarios, the 16 pairwise comparisons clearly demonstrate that the adaptive versions of penalized procedures (

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

) dominate their non-adaptive versions as follows; in prediction, they each perform better

88 %

of the time over their respective non-adaptive versions; with respect to correctly fitted models, they perform better

100 %

of the time and

38 %

(while in

62 %

of the time they perform equally) over their respective non-adaptive versions; and with respect to correctly shrinking zero coefficients, they outperform their non-adaptive ones

100 %

and

88 %

of the time over their respective non-adaptive versions.

In the weighted scenarios, the 16 pairwise comparisons clearly demonstrate that the adaptive versions of penalized procedures (

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

) dominate their non-adaptive versions as follows; in prediction, they each perform better

100 %

and

75 %

of the time over their respective non-adaptive versions; with respect to correctly fitted models, they perform better

88 %

of the time each, respectively; and with respect to correctly shrinking zero coefficients, they outperform their non-adaptive ones

88 %

and

100 %

of the time over their respective non-adaptive versions.

The pairwise comparisons between the weighted versions of penalized procedures (

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

) and their unweighted versions (

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

) result in the former outperforming the latter in all respective cases with respect to prediction, while there is no clear “winner” with respect to the other two measures. Overall, the

W Q R

-

A L A S S O

generally outperforms all other models with respect to all measures.

At the predictor matrix with collinearity-masking points

(D 3)

under the t-distribution, all adaptive versions outperformed their non-adaptive versions with respect to all metrics. In the unweighted scenarios, the 16 pairwise comparisons clearly demonstrate that the adaptive versions of penalized procedures (

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

) dominate their non-adaptive versions as follows; in prediction, they perform better

63 %

and

75 %

of the time over their respective non-adaptive versions; with respect to correctly fitted models, they perform better

100 %

of the time and

38 %

(while in

62 %

of the time they perform equally) over their respective non-adaptive versions; and with respect to correctly shrinking zero coefficients, they outperform their non-adaptive ones

100 %

of the time in both cases over their respective non-adaptive versions.

In the weighted scenarios, the 16 pairwise comparisons of the adaptive versions of penalized procedures (

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

) against their non-adaptive versions show that the dominance of the former class of models is somewhat reduced, though they generally outperform the latter. In the prediction scenario, the adaptive versions are outperformed by their non-adaptive versions

62 %

of the time in both respective cases, while it is the opposite with respect to correctly fitted models and with respect to correctly shrinking zero coefficients. With respect to these two metrics, the adaptive versions of penalized procedures (

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

) dominate their non-adaptive versions as follows; with respect to correctly fitted models, they perform better

62 %

of the time, while with respect to correctly shrinking zero coefficients they outperform their non-adaptive ones

100 %

of the time in both cases over their respective non-adaptive versions.

The pairwise comparisons between the weighted versions of penalized procedures (

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

) and their unweighted versions (

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

) result in the former outperforming the latter in all respective cases with respect to prediction while doing so marginally better with respect to the other two measures. Overall, the

W Q R

-

A L A S S O

generally outperforms all other models with respect to all measures.

3.2.5. D6 under the t-Distribution

The results of variable/model selection as well as prediction performance at the design matrix with collinearity and high leverage points (

D 6

) under heavy-tailed distributions (t-distributions with

d \in (1; 6)

degrees of freedom) scenarios are shown in Table 6 and Table A5. In the unweighted scenarios, the 16 pairwise comparisons clearly demonstrate that the adaptive versions of penalized procedures (

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

) dominate their non-adaptive versions as follows; in prediction, they each perform better

75 %

and

62 %

of the time over their respective non-adaptive versions; with respect to correctly fitted models, they perform better in

100 %

and

88 %

of the time over their respective non-adaptive versions, while with respect to correctly shrinking zero coefficients they outperform their non-adaptive ones

100 %

of the time in both cases over their respective non-adaptive versions.

In the weighted scenarios, the 16 pairwise comparisons clearly demonstrate that the adaptive versions of penalized procedures (

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

) dominate their non-adaptive versions as follows; with respect to prediction, they outperform their non-adaptive ones

100 %

of the time in both cases over their respective non-adaptive versions; and with respect to both correctly fitted models and correctly shrinking zero coefficients they both perform better in

88 %

of the time over their respective non-adaptive versions.

The pairwise comparisons between the weighted versions of penalized procedures (

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

) and their non-weighted versions (

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

) result in former outperforming the latter in all cases with respect to prediction while there is no clear “winner” with respect to the other two measures. Overall, the

W Q R

-

A L A S S O

generally outperforms all other models with respect to all measures.

3.3. Examples

In this section, we illustrate the efficacy of the weighted

Q R

-

R I D G E

-based adaptive weights in penalized

Q R

using two data sets from the literature often used to illustrate the efficacy of robust methodologies in mitigation against collinearity influential as well as high leverage points in general,

v i z .,

the Jet-Turbine Engine [32] and Gunst and Mason [33] data sets. The Jet-Turbine Engine data set, which has high collinearity reducing points, is very popular with Engineers (see [32,34]). In contrast, the Gunst and Mason data set, which has collinearity-inducing points, is very popular with statisticians (see also [33]).

3.4. The Jet-Turbine Engine Data

The Jet-Turbine Engine data set [32] data set consists of 40 observations with thrust of a jet-turbine engine as the response variable

(Y)

and predictor variables: primary speed of rotation

(X_{1})

, secondary speed of rotation

(X_{2})

, fuel flow rate

(X_{3})

, pressure

(X_{4})

, exhaust temperature

(X_{5})

, and ambient temperature at time of test

(X_{6})

. According to [34], observations 6 and 20 are high leverage collinearity reducing points. The red data is standardized to correlation form and the predictor variable is generated by

Y_{1} = X_{1}^{'} β_{1} + ϵ_{1}

,

ϵ_{1} \sim t_{6}

for the 38 observations excluding the collinearity reducing ones with

X_{2}

and

Y_{2}

comprising cases (observations) 6 and 20, such that

X = {(X_{1}, X_{2})}^{'}

and

Y = {(Y_{1}, Y_{2})}^{'}

, where

β_{1} = {(50, 0, 0, 10, 15, 0)}^{'}

.

Table 7 summarizes the estimated

Q R

β

s and their biases from the true coefficients/true

β

s (

β_{1} = {(50, 0, 0, 10, 15, 0)}^{'}

). In the unweighted scenario, the zero coefficients are shrunk to zero

33 %

of the time with no clear “winner” between the adaptive and non-adaptive penalization. In the weighted scenario, the proportion of correctly shrunk coefficients increases to

54 %

, again with no clear “winner” between the adaptive and non-adaptive penalization. However, the adaptive penalization seem to guard against incorrectly shrinking non-zero coefficients to zero (see

β_{4}

under

W Q R

-

L A S S O

at

τ = 0.50

). As expected, the

L A S S O

(and

A L A S S O

) penalties outperform E-

N E T

(and

A E

-

N E T

) penalties. From Figure 2a, it is clear that the weighting increases the variability of the residuals as more outliers are flagged by the weighting.

3.5. The Gunst and Mason Data

The performance of the proposed adaptive procedures is assessed in this section using the [33] data set. The data set has 49 observations (countries) with the response variable gross national product

(G N P)

and predictor variables: infant death rate

(T N F D)

, physician to population ratio

(P H Y S)

, population density

(D E N S)

, density of agricultural land

(A G D S)

, measure of literacy

(L I T)

, and higher education index

(H I E D)

corresponding the response variable Y and predictor variables and predictor variables

X_{1}

,

X_{2}

,

X_{3}

,

X_{4}

,

X_{5}

, and

X_{6}

, respectively. From the literature, strong collinearity between the

D E N S

(

X_{3}

) and

A G D S

(

X_{4}

) exist, and a few observations are high leverage points. The following points are considered outlying and influential: 7 (Canada), 13 (El Salvador), 17 (Hong Kong), 20 (India), 39 (Singapore), and 46 (United States of America) with observations 17 and 39 being high leverage collinearity inducing points. We standardize the data to correlation form and generate the predictor variable as

Y_{1} = X_{1}^{'} β_{1} + ϵ_{1}

,

ϵ_{1} \sim t_{6}

for the first 43 observations (without influential observations 7, 13, 17, 20, 39 and 46) with

X_{2}

and

Y_{2}

comprising observations 7, 13, 17, 20, 39 and 46, such that

X = {(X_{1}, X_{2})}^{'}

and

Y = {(Y_{1}, Y_{2})}^{'}

, where

β_{1} = {(0, 8, - 13, 0, 0, 6)}^{'}

.

Table 8 summarizes the estimated

Q R

β

s and their biases from the true coefficients/true

β

s (

β_{1} = {(0, 8, - 13, 0, 0, 6)}^{'}

). In the unweighted scenario, the zero coefficients are shrunk to zero

38 %

of the time with

25 %

shrunk to zero among the non-adaptive penalization procedures and

50 %

shrunk to zero among the adaptive penalization procedures. Thus, adaptive penalization clearly outperforms non-adaptive penalization. In the weighted scenario, the proportion of correctly shrunk coefficients increases to

83 %

with

67 %

shrunk to zero among the non-adaptive penalization procedures and

100 %

shrunk to zero among the adaptive penalization procedures. Thus, again adaptive penalization clearly outperforms non-adaptive penalization. Additionally, in the weighted scenario the

A L A S S O

and

A E

-

N E T

perform equally. As expected the

L A S S O

(and

A L A S S O

) penalties outperform E-

N E T

(and

A E

-

N E T

) penalties. From Figure 2b, it is clear that the weighting increases the variability of the residuals as more outliers are flagged by the weighting.

4. Discussion

In this article, we proposed

A L A S S O

and

A E

-

N E T

penalized

Q R

variable selection and parameter estimation procedures (both weighted and unweighted). We used the proposed adaptive weights,

{\tilde{ω}}_{j}

, based on the

\hat{β} {(τ)}_{W Q R}

, a

W Q R

-

R I D G E

parameter estimate in our proposed adaptive penalized

Q R

procedures. Our proposed

Q R

-

A L A S S O

,

Q R

-

A E

-

N E T

,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

variable selection and parameter estimation procedures were subjected to a simulation study. We first discuss the results of this simulation study results and the applications to well-known data sets from the literature.

The simulation study results under the Gaussian distribution at

D 1

-

D 6

(predictor matrices with collinearity influential points, collinearity and high leverage points) show that

Q R

-

A L A S S O

,

Q R

-

A E

-

N E T

,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

outperform their non-adaptive counterparts in at least 63% of the time in each respective scenario when low standard errors of the error term are present as in [30]. This is expected as the adaptive penalized

Q R

methods satisfy sparsity and are asymptotically normally distributed, and provide a balance between bias-variance trade-offs. However, the performance with respect to all measures becomes compromised as the error term distribution becomes heavier. This same pattern of results is also evident in the heavy-tailed t-distribution scenario. In unweighted scenario, the

Q R

-

A L A S S O

is superior to other penalized procedures in the presence of larger errors in agreeing with the case of [16]’s results, where

A L A S S O

is superior to

L A S S O

. A similar pattern is observed in the weighted scenarios. However, a comparison between the weighted scenarios and the unweighted ones shows that the

M C D

weights enhance the efficacy of the penalized procedures at predictor matrices with collinearity influential observations, collinearity and high leverage points (as demonstrated in [14]) with the

W Q R

-

A L A S S O

performing best.

The applications of the suggested procedures to real life data sets are more or less in line with the simulation studies with the procedures exhibiting a higher efficacy at the predictor matrix with collinearity inducing points, i.e., the [33] data set, than the one with collinearity reducing points, i.e., the [32] data set. However, it must be noted that the latter data set has the severest collinearity inherent in it. The introduction of collinearity reducing to “clean” data only managed to lower the

c o n d i t i o n

n u m b e r

from 52.09 to 47.78, which is indicative of the fact that the data is still highly collinear (see [34]). Again, the

W Q R

-

A L A S S O

performs best as in the simulation studies. Thus, although variable selection and regularization is adversely affected by heavy-tailed error term distributions (outliers), the adaptive weights generally improve the efficacy of penalized

Q R

procedures with the

M C D

-based weights tending to increase their efficacy. We recommend the use of our

A L A S S O

penalty for variable selection, with an option to use the

A E

-

N E T

penalty if the objective of the study is prediction. Parameter estimation results from popular data sets in the literature support the applicability of our proposed methods.

Author Contributions

Conceptualization, I.M. and E.R.; methodology, I.M.; software, I.M.; validation, I.M, and E.R.; formal analysis, I.M.; investigation, I.M.; resources, I.M.; data curation, I.M.; writing—original draft preparation, I.M.; writing—review and editing, E.R.; visualization, E.R.; supervision, E.R.; project administration, I.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the University of South Africa.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We acknowledge support from University of South Africa and Vaal University of Technology.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

$L S$	least squares
$L A D$	least absolute deviation
$Q R$	quantile regression
$L A S S O$	east absolute shrinkage and selection operator
E- $N E T$	elastic net
$S C A D$	smoothly clipped absolute deviation
$M C P$	minimax concave penalty
$W Q R$	weighted quantile regression
$A R I D G E$	adaptive $R I D G E$
$Q R$ - $A L A S S O$	quantile regression adaptive $L A S S O$
$Q R$ - $A E$ - $N E T$	quantile regression adaptive elastic net
$W Q R$ - $R I D G E$	weighted quantile regression $R I D G E$
$W Q R$ - $A L A S S O$	weighted quantile regression adaptive $L A S S O$
$W Q R$ - $A E$ - $N E T$	weighted quantile regression adaptive elastic net
$M C D$	minimum covariance determinant
$W R R$	weighted $R I D G E$ regression
$R Q$	regression quantile

Appendix A

Table A1. Penalized weighted and unweighted quantile regression at D4 for

n = 50

(

τ = 0.50

) under the normal distribution.

Table A1. Penalized weighted and unweighted quantile regression at D4 for

n = 50

(

τ = 0.50

) under the normal distribution.

Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	No of Zeros		Median ( $λ$ )
Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	c.zero	inc.zero	Median ( $λ$ )
D4: N-distribution	$σ = 1, τ = 0.50$	QR-LASSO	−0.03(1.40)	5.50	3.49	0.44	0.02
		QR-E-NET	−0.07(3.56)	0.00	1.06	0.01	0.02
		QR-ALASSO	0.00(1.37)	88.50	5.00	0.15	0.01
		QR-AE-NET	−0.03(3.45)	55.00	4.00	0.02	0.02
	$σ = 3, τ = 0.50$	QR-LASSO	−1.31(6.57)	5.00	3.46	0.64	0.02
		QR-E-NET	−0.78(5.30)	0.00	1.60	0.36	0.02
		QR-ALASSO	−1.21(7.80)	0.00	4.97	1.62	0.00
		QR-AE-NET	−0.27(5.07)	0.00	4.31	1.50	0.01
	$σ = 1, τ = 0.50$	WQR-LASSO	0.00(0.85)	67.50	4.58	0.00	0.04
		WQR-E-NET	0.00(0.90)	33.00	4.05	0.00	0.04
		WQR-ALASSO	0.02(0.74)	98.00	4.99	0.01	0.03
		WQR-AE-NET	0.01(0.82)	94.50	4.95	0.01	0.06
	$σ = 3, τ = 0.50$	WQR-LASSO	−0.04(2.33)	32.50	4.64	0.59	0.04
		WQR-E-NET	−0.06(2.34)	20.50	4.01	0.42	0.04
		WQR-ALASSO	0.02(2.10)	17.50	4.97	1.21	0.01
		WQR-AE-NET	0.02(2.20)	15.50	4.95	1.26	0.02

Table A2. Penalized weighted and unweighted quantile regression at D5 for

n = 50

(

τ = 0.25

) under the normal distribution.

Table A2. Penalized weighted and unweighted quantile regression at D5 for

n = 50

(

τ = 0.25

) under the normal distribution.

Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	No of Zeros		Median ( $λ$ )
Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	c.zero	inc.zero	Median ( $λ$ )
D5: N-distribution	$σ = 1, τ = 0.25$	QR-LASSO	2.68(4.14)	12.00	5.00	1.89	0.01
		QR-E-NET	3.27(4.67)	0.00	4.94	2.94	0.04
		QR-ALASSO	1.15(1.85)	74.00	5.00	0.30	0.01
		QR-AE-NET	3.26(4.65)	0.00	5.00	2.87	0.04
	$σ = 3, τ = 0.25$	QR-LASSO	3.61(5.43)	1.00	5.00	2.57	0.05
		QR-E-NET	3.68(5.45)	0.00	4.94	2.87	0.04
		QR-ALASSO	3.42(5.20)	12.50	5.00	2.03	0.00
		QR-AE-NET	3.54(5.47)	0.00	4.98	2.94	0.00
	$σ = 1, τ = 0.25$	WQR-LASSO	0.50(0.82)	71.00	4.61	0.00	0.04
		WQR-E-NET	0.53(0.82)	46.00	4.23	0.00	0.04
		WQR-ALASSO	0.51(0.75)	99.50	5.00	0.01	0.04
		WQR-AE-NET	0.52(0.79)	99.50	5.00	0.01	0.06
	$σ = 3, τ = 0.25$	WQR-LASSO	1.57(2.39)	36.50	4.64	0.55	0.04
		WQR-E-NET	1.67(2.46)	27.50	4.24	0.43	0.04
		WQR-ALASSO	1.58(2.36)	41.50	4.96	0.67	0.02
		WQR-AE-NET	1.71(2.42)	50.50	4.97	0.54	0.03

Table A3. Penalized weighted and unweighted quantile regression at D2 for

n = 50

(

τ = 0.50

) under the heavy tailed t-distribution.

Table A3. Penalized weighted and unweighted quantile regression at D2 for

n = 50

(

τ = 0.50

) under the heavy tailed t-distribution.

Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	No of Zeros		Median ( $λ$ )
Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	c.zero	inc.zero	Median ( $λ$ )
D2: t-distribution	$σ = 1, d = 6, τ = 0.50$	QR-LASSO	−0.01(1.27)	11.50	3.31	0.00	0.02
		QR-E-NET	−0.02(1.60)	0.00	1.89	0.00	0.02
		QR-ALASSO	0.03(1.23)	87.00	4.87	0.00	0.02
		QR-AE-NET	0.05(1.45)	0.00	3.05	0.00	0.03
	$σ = 0.50, d = 6, τ = 0.50$	QR-LASSO	−0.02(0.61)	10.50	3.32	0.00	0.03
		QR-E-NET	0.00(0.84)	0.00	1.94	0.00	0.02
		QR-ALASSO	−0.02(0.61)	97.00	4.97	0.00	0.02
		QR-AE-NET	0.01(0.77)	0.00	3.00	0.00	0.03
	$σ = 1, d = 6, τ = 0.50$	WQR-LASSO	−0.03(0.83)	48.00	4.29	0.00	0.04
		WQR-E-NET	−0.03(0.86)	3.00	2.69	0.00	0.04
		WQR-ALASSO	0.03(0.84)	99.50	5.00	0.00	0.04
		WQR-AE-NET	0.02(0.88)	84.50	4.84	0.00	0.06
	$σ = 0.50, d = 6, τ = 0.50$	WQR-LASSO	0.00(0.41)	39.50	4.16	0.00	0.04
		WQR-E-NET	0.00(0.43)	3.00	2.78	0.00	0.04
		WQR-ALASSO	0.01(0.42)	99.50	5.00	0.00	0.05
		WQR-AE-NET	0.01(0.43)	95.50	4.96	0.00	0.08

Table A4. Penalized weighted and unweighted quantile regression at D3 for

n = 50

(

τ = 0.25

) under the heavy tailed t-distribution.

Table A4. Penalized weighted and unweighted quantile regression at D3 for

n = 50

(

τ = 0.25

) under the heavy tailed t-distribution.

Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	No of Zeros		Median ( $λ$ )
Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	c.zero	inc.zero	Median ( $λ$ )
D3: t-distribution	$σ = 1, d = 6, τ = 0.25$	QR-LASSO	0.82(1.29)	3.50	2.91	0.02	0.01
		QR-E-NET	1.57(2.23)	0.00	0.17	0.00	0.01
		QR-ALASSO	0.81(1.24)	96.50	4.97	0.00	0.03
		QR-AE-NET	1.06(1.53)	0.00	3.00	0.00	0.03
	$σ = 0.50, d = 6, τ = 0.25$	QR-LASSO	0.37(0.66)	4.00	2.84	0.00	0.01
		QR-E-NET	1.04(1.66)	0.00	0.14	0.00	0.01
		QR-ALASSO	0.40(0.64)	100.00	5.00	0.00	0.03
		QR-AE-NET	0.79(1.09)	0.00	3.00	0.00	0.05
	$σ = 1, d = 6, τ = 0.25$	WQR-LASSO	0.40(0.76)	49.00	4.28	0.00	0.04
		WQR-E-NET	0.42(0.77)	4.50	2.89	0.00	0.04
		WQR-ALASSO	0.54(0.89)	91.50	4.91	0.01	0.03
		WQR-AE-NET	0.59(0.95)	64.00	4.55	0.00	0.04
	$σ = 0.50, d = 6, τ = 0.25$	WQR-LASSO	0.19(0.36)	43.00	4.25	0.00	0.05
		WQR-E-NET	0.22(0.38)	3.50	2.77	0.00	0.04
		WQR-ALASSO	0.27(0.45)	96.50	4.97	0.00	0.04
		WQR-AE-NET	0.30(0.47)	82.00	4.80	0.00	0.06

Table A5. Penalized weighted and unweighted quantile regression at D6 under the heavy tailed t-distribution for

n = 50

(

τ = 0.50

).

Table A5. Penalized weighted and unweighted quantile regression at D6 under the heavy tailed t-distribution for

n = 50

(

τ = 0.50

).

Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	No of Zeros		Median ( $λ$ )
Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	c.zero	inc.zero	Median ( $λ$ )
D6: t-distribution	$σ = 1, d = 6, τ = 0.50$	QR-LASSO	0.05(1.26)	51.00	4.35	0.00	0.05
		QR-E-NET	0.03(1.30)	1.50	2.29	0.00	0.04
		QR-ALASSO	0.03(1.22)	99.50	5.00	0.00	0.04
		QR-AE-NET	0.05(1.28)	51.50	4.51	0.00	0.07
	$σ = 0.50, d = 6, τ = 0.50$	QR-LASSO	0.00(0.64)	72.50	4.65	0.00	0.05
		QR-E-NET	0.01(0.67)	5.00	3.00	0.00	0.04
		QR-ALASSO	0.00(0.63)	96.50	4.96	0.00	0.04
		QR-AE-NET	0.00(0.64)	24.50	4.00	0.00	0.06
	$σ = 1, d = 6, τ = 0.50$	WQR-LASSO	0.01(0.87)	55.50	4.45	0.00	0.04
		WQR-E-NET	−0.01(0.89)	6.00	3.23	0.00	0.04
		WQR-ALASSO	−0.01(0.63)	94.50	4.93	0.00	0.04
		WQR-AE-NET	−0.01(0.66)	23.00	3.91	0.00	0.05
	$σ = 0.50, d = 6, τ = 0.50$	WQR-LASSO	−0.01(0.44)	44.00	4.19	0.00	0.04
		WQR-E-NET	0.00(0.47)	5.00	3.13	0.00	0.04
		WQR-ALASSO	0.00(0.37)	99.00	4.99	0.00	0.05
		WQR-AE-NET	0.00(0.37)	85.00	4.85	0.00	0.07

References

Chatterjee, S.; Hadi, A.S. Impact of simultaneous omission of a variable and an observation on a linear regression equation. Comput. Stat. Data Anal. 1988, 6, 129–144. [Google Scholar] [CrossRef]
Bloomfield, P.; Steiger, W. Least absolute deviations curve-fitting. SIAM J. Sci. Stat. Comput. 1980, 1, 290–301. [Google Scholar] [CrossRef]
Koenker, R.; Bassett, G., Jr. Regression quantiles. Econom. J. Econom. Soc. 1978, 33–50. [Google Scholar] [CrossRef]
Breiman, L. Better subset regression using the nonnegative garrote. Technometrics 1995, 37, 373–384. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the Elastic Net. J. R. Stat. Soc. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 2006, 68, 49–67. [Google Scholar] [CrossRef]
Zhao, P.; Yu, B. On model selection consistency of Lasso. J. Mach. Learn. Res. 2006, 7, 2541–2563. [Google Scholar]
Fan, J.; Li, R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef]
Karunamuni, R.J.; Kong, L.; Tu, W. Efficient robust doubly adaptive regularized regression with applications. Stat. Methods Med. Res. 2019, 28, 2210–2226. [Google Scholar] [CrossRef] [PubMed]
Salibián-Barrera, M.; Wei, Y. Weighted quantile regression with nonelliptically structured covariates. Can. J. Stat. 2008, 36, 595–611. [Google Scholar] [CrossRef]
Ranganai, E.; Mudhombo, I. Variable Selection and Regularization in Quantile Regression via Minimum Covariance Determinant Based Weights. Entropy 2021, 23, 33. [Google Scholar] [CrossRef] [PubMed]
Fan, J.; Peng, H. Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 2004, 32, 928–961. [Google Scholar] [CrossRef] [Green Version]
Zou, H. The Adaptive Lasso and Its Oracle Properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
Frommlet, F.; Nuel, G. An adaptive ridge procedure for l0 regularization. PLoS ONE 2016, 11, e0148620. [Google Scholar] [CrossRef] [Green Version]
Wu, Y.; Liu, Y. Variable selection in quantile regression. Stat. Sin. 2009, 19, 801–817. [Google Scholar]
Zou, H.; Zhang, H.H. On the adaptive elastic-net with a diverging number of parameters. Ann. Stat. 2009, 37, 1733. [Google Scholar] [CrossRef] [Green Version]
Rousseeuw, P. Multivariate Estimation with High Breakdown Point. Math. Stat. Appl. 1985, 8, 283–297. [Google Scholar] [CrossRef]
Zhao, Q. Restricted regression quantiles. J. Multivar. Anal. 2000, 72, 78–99. [Google Scholar] [CrossRef] [Green Version]
Ranganai, E. Aspects of Model Development Using Regression Quantiles and Elemental Regressions. Ph.D. Thesis, Stellenbosch University, Stellenbosch, South Africa, 2007. [Google Scholar]
Giloni, A.; Simonoff, J.S.; Sengupta, B. Robust weighted LAD regression. Comput. Stat. Data Anal. 2006, 50, 3124–3140. [Google Scholar] [CrossRef]
Hubert, M.; Rousseeuw, P.J. Robust regression with both continuous and binary regressors. J. Stat. Plan. Inference 1997, 57, 153–163. [Google Scholar] [CrossRef]
Pollard, D. Asymptotics for Least Absolute Deviation Regression Estimators. Econom. Theory 1991, 7, 186–199. [Google Scholar] [CrossRef]
Knight, K. Asymptotics for L1-estimators of regression parameters under heteroscedasticityY. Can. J. Stat. 1999, 27, 497–507. [Google Scholar] [CrossRef]
Koenker, R. Quantile Regression; Econometric Society Monographs, Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Jongh, P.J.D.; de Wet, T.; Welsh, A.H. Mallows-Type Bounded-Influence-Regression Trimmed Means. J. Am. Stat. Assoc. 1988, 83, 805–810. [Google Scholar] [CrossRef] [Green Version]
Ranganai, E.; Vuuren, J.O.V.; Wet, T.D. Multiple Case High Leverage Diagnosis in Regression Quantiles. Commun. Stat.-Theory Methods 2014, 43, 3343–3370. [Google Scholar] [CrossRef] [Green Version]
Arslan, O. Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression. Comput. Stat. Data Anal. 2012, 56, 1952–1965. [Google Scholar] [CrossRef]
Yi, C. R-Add-On-Software: Hqreg. Available online: https://cloud.r-project.org/web/packages/hqreg (accessed on 20 February 2022).
Brownlee, K.A. Statistical theory and methodology in science and engineering. In A Wiley Publication in Applied Statistics; Wiley: Hoboken, NJ, USA, 1965. [Google Scholar]
Ruppert, D.; Carroll, R.J. Trimmed least squares estimation in the linear model. J. Am. Stat. Assoc. 1980, 75, 828–838. [Google Scholar] [CrossRef]
Cook, R.D. Influential Observations in Linear Regression. J. Am. Stat. Assoc. 1979, 74, 169–174. [Google Scholar] [CrossRef]

Figure 1. Collinearity influential points. First row panel shows boxplots for collinearity inducing points in D2 and D4, and second row panel shows boxplots for collinearity masking points (collinearity hiding points) in D3 and D5.

Figure 2. Residual box plots for data sets at

τ = 0.50

QR level. (a) Box plots for the Gunst and Mason data set and (b) box plots for the Jet-Turbine Engine data set.

Figure 2. Residual box plots for data sets at

τ = 0.50

QR level. (a) Box plots for the Gunst and Mason data set and (b) box plots for the Jet-Turbine Engine data set.

Table 1. Penalized unweighted quantile regression at D1 for

n = 50

(

τ = 0.25

and

τ = 0.50

) under the normal and heavy tailed t-distributions; bold text indicate superior performance.

Table 1. Penalized unweighted quantile regression at D1 for

n = 50

(

τ = 0.25

and

τ = 0.50

) under the normal and heavy tailed t-distributions; bold text indicate superior performance.

Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	No of Zeros		Median ( $λ$ )
Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	c.zero	inc.zero	Median ( $λ$ )
D1: N-distribution	$σ = 1,$	QR-LASSO	0.72(1.17)	62.00	4.40	0.00	0.04
	$σ = 1,$	QR-E-NET	0.75(1.26)	16.50	3.35	0.00	0.03
	$τ = 0.25$	QR-ALASSO	0.71(1.10)	99.50	5.00	0.00	0.03
	$τ = 0.25$	QR-AE-NET	0.75(1.16)	97.00	4.97	0.00	0.05
	$σ = 3,$	QR-LASSO	2.20(3.60)	44.00	4.39	0.28	0.04
	$σ = 3,$	QR-E-NET	2.29(3.68)	27.00	3.83	0.14	0.04
	$τ = 0.25$	QR-ALASSO	2.15(3.38)	60.00	4.95	0.46	0.01
	$τ = 0.25$	QR-AE-NET	2.28(3.54)	67.50	4.90	0.28	0.01
	$σ = 1,$	QR-LASSO	0.02(1.16)	65.50	4.51	0.00	0.04
	$σ = 1,$	QR-E-NET	0.02(1.19)	21.00	3.56	0.00	0.04
	$τ = 0.50$	QR-ALASSO	0.01(1.13)	100.00	5.00	0.00	0.04
	$τ = 0.50$	QR-AE-NET	0.01(1.17)	100.00	5.00	0.00	0.06
	$σ = 3,$	QR-LASSO	0.09(3.47)	47.50	4.51	0.22	0.05
	$σ = 3,$	QR-E-NET	0.06(3.62)	32.50	3.99	0.09	0.04
	$τ = 0.50$	QR-ALASSO	0.06(3.41)	49.00	4.90	0.55	0.02
	$τ = 0.50$	QR-AE-NET	0.05(3.51)	60.50	4.85	0.30	0.03
D1: t-distribution	$σ = 1, d = 1,$	QR-LASSO	2.32(3.81)	30.50	4.96	1.66	0.04
	$σ = 1, d = 1,$	QR-E-NET	2.55(3.87)	29.50	4.77	1.46	0.03
	$τ = 0.25$	QR-ALASSO	2.36(3.78)	11.50	4.97	1.95	0.00
	$τ = 0.25$	QR-AE-NET	2.52(3.93)	12.50	4.94	1.86	0.00
	$σ = 3, d = 1,$	QR-LASSO	4.70(7.15)	1.50	5.00	2.93	0.06
	$σ = 3, d = 1,$	QR-E-NET	4.72(7.14)	1.50	4.99	2.91	0.05
	$τ = 0.25$	QR-ALASSO	4.68(7.12)	0.50	4.99	2.93	0.00
	$τ = 0.25$	QR-AE-NET	4.72(7.17)	0.50	5.00	2.94	0.00
	$σ = 1, d = 1,$	QR-LASSO	0.03(2.91)	40.50	4.96	1.27	0.04
	$σ = 1, d = 1,$	QR-E-NET	0.02(3.27)	29.00	4.64	1.10	0.03
	$τ = 0.50$	QR-ALASSO	−0.07(3.24)	32.50	4.99	1.56	0.00
	$τ = 0.50$	QR-AE-NET	−0.05(3.43)	37.00	4.93	1.38	0.00
	$σ = 3, d = 1,$	QR-LASSO	−0.14(6.89)	1.50	4.99	2.85	0.05
	$σ = 3, d = 1,$	QR-E-NET	−0.14(6.91)	1.00	4.97	2.84	0.05
	$τ = 0.50$	QR-ALASSO	−0.14(6.84)	0.50	5.00	2.86	0.00
	$τ = 0.50$	QR-AE-NET	−0.13(6.90)	2.00	4.99	2.85	0.00

Table 2. Penalized weighted and unweighted quantile regression at D2 and D4 for

n = 50

(

τ = 0.25

and

τ = 0.50

) under the normal distribution; bold text indicate superior performance.

Table 2. Penalized weighted and unweighted quantile regression at D2 and D4 for

n = 50

(

τ = 0.25

and

τ = 0.50

) under the normal distribution; bold text indicate superior performance.

Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	No of Zeros		Median( $λ$ )
Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	c.zero	inc.zero	Median( $λ$ )
D2: N-distribution	$σ = 1, τ = 0.25$	QR-LASSO	−1.10(5.65)	38.00	4.00	0.01	0.03
		QR-E-NET	−0.93(5.55)	2.50	2.55	0.00	0.02
		QR-ALASSO	−1.08(5.60)	91.50	4.92	0.01	0.02
		QR-AE-NET	−0.36(5.01)	81.50	4.81	0.00	0.03
	$σ = 3, τ = 0.25$	QR-LASSO	0.96(6.30)	17.50	4.23	0.43	0.02
		QR-E-NET	1.25(5.85)	9.50	3.39	0.16	0.02
		QR-ALASSO	0.90(6.45)	56.50	4.98	0.48	0.01
		QR-AE-NET	1.35(5.64)	72.50	4.93	0.22	0.02
	$σ = 1, τ = 0.25$	WQR-LASSO	−1.48(4.49)	62.50	4.47	0.00	0.03
		WQR-E-NET	−1.56(4.61)	11.00	3.34	0.00	0.03
		WQR-ALASSO	−1.69(4.44)	94.50	4.95	0.01	0.04
		WQR-AE-NET	−1.60(4.26)	92.00	4.92	0.00	0.06
	$σ = 3, τ = 0.25$	WQR-LASSO	0.50(4.63)	29.50	4.53	0.62	0.03
		WQR-E-NET	0.76(4.31)	28.00	4.34	0.54	0.04
		WQR-ALASSO	1.47(2.50)	57.50	4.89	0.44	0.01
		WQR-AE-NET	1.50(2.53)	53.00	4.76	0.38	0.01
	$σ = 1, τ = 0.50$	QR-LASSO	−1.87(5.78)	36.00	3.83	0.01	0.02
		QR-E-NET	-1.91(5.72)	0.00	1.94	0.00	0.02
		QR-ALASSO	−1.91(5.76)	98.50	4.99	0.00	0.03
		QR-AE-NET	−1.61(5.17)	94.00	4.94	0.00	0.05
	$σ = 3, τ = 0.50$	QR-LASSO	−1.41(6.56)	21.00	3.99	0.28	0.03
		QR-E-NET	−1.34(6.21)	4.50	2.53	0.09	0.02
		QR-ALASSO	−1.53(6.65)	60.00	4.78	0.25	0.02
		QR-AE-NET	−1.17(5.97)	34.50	4.19	0.07	0.02
	$σ = 1, τ = 0.50$	WQR-LASSO	−2.22(4.24)	66.50	4.57	0.00	0.04
		WQR-E-NET	−2.04(4.39)	30.00	3.86	0.00	0.04
		WQR-ALASSO	−2.28(4.37)	99.50	5.00	0.00	0.04
		WQR-AE-NET	−2.20(4.27)	99.00	4.99	0.00	0.06
	$σ = 3, τ = 0.50$	WQR-LASSO	−1.06(4.49)	36.50	4.69	0.54	0.04
		WQR-E-NET	−0.99(4.28)	28.00	4.34	0.37	0.05
		WQR-ALASSO	−0.01(2.39)	46.50	4.85	0.48	0.01
		WQR-AE-NET	−0.03(2.49)	36.00	4.61	0.39	0.01
D4: N-distribution	$σ = 1, τ = 0.25$	QR-LASSO	0.97(1.65)	10.00	3.88	0.61	0.02
		QR-E-NET	2.74(3.71)	1.00	1.69	0.16	0.01
		QR-ALASSO	0.88(1.50)	65.00	5.00	0.42	0.01
		QR-AE-NET	2.41(3.44)	41.00	4.54	0.16	0.01
	$σ = 3, τ = 0.25$	QR-LASSO	1.67(6.06)	3.50	4.07	1.26	0.02
		QR-E-NET	2.64(5.30)	0.00	2.25	0.75	0.01
		QR-ALASSO	2.60(7.05)	0.00	4.32	1.69	0.00
		QR-AE-NET	2.83(5.13)	0.00	2.86	1.19	0.00
	$σ = 1, τ = 0.25$	WQR-LASSO	0.47(0.78)	64.50	4.51	0.00	0.03
		WQR-E-NET	0.49(0.85)	30.00	3.96	0.00	0.04
		WQR-ALASSO	0.52(0.77)	99.50	5.00	0.00	0.03
		WQR-AE-NET	0.56(0.78)	99.00	4.99	0.00	0.05
	$σ = 3, τ = 0.25$	WQR-LASSO	1.37(2.31)	15.00	4.46	1.09	0.04
		WQR-E-NET	1.52(2.38)	35.00	4.29	0.28	0.04
		WQR-ALASSO	1.85(2.63)	0.00	5.00	1.58	0.01
		WQR-AE-NET	1.93(2.72)	0.00	4.96	1.51	0.02

Table 3. Penalized weighted and unweighted quantile regression at D3 and D5 for

n = 50

(

τ = 0.25

and

τ = 0.50

) under the normal distribution; bold text indicate superior performance.

Table 3. Penalized weighted and unweighted quantile regression at D3 and D5 for

n = 50

(

τ = 0.25

and

τ = 0.50

) under the normal distribution; bold text indicate superior performance.

Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	No of Zeros		Median( $λ$ )
Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	c.zero	inc.zero	Median( $λ$ )
D3: N-distribution	$σ = 1, τ = 0.25$	QR-LASSO	0.65(1.21)	60.50	4.45	0.00	0.01
		QR-E-NET	0.81(1.46)	12.50	3.48	0.00	0.01
		QR-ALASSO	0.79(1.21)	96.50	5.00	0.06	0.02
		QR-AE-NET	0.87(1.35)	99.00	5.00	0.01	0.02
	$σ = 3, τ = 0.25$	QR-LASSO	2.01(3.79)	42.50	4.48	0.35	0.01
		QR-E-NET	2.47(4.20)	17.50	3.97	0.43	0.01
		QR-ALASSO	2.17(3.49)	74.50	4.75	0.19	0.00
		QR-AE-NET	2.74(4.16)	44.50	4.48	0.42	0.00
	$σ = 1, τ = 0.25$	WQR-LASSO	0.48(0.91)	28.50	3.83	0.00	0.04
		WQR-E-NET	0.41(0.77)	15.50	3.59	0.00	0.03
		WQR-ALASSO	0.48(0.70)	98.00	4.99	0.01	0.04
		WQR-AE-NET	0.50(0.82)	98.50	4.99	0.01	0.04
	$σ = 3, τ = 0.25$	WQR-LASSO	1.25(2.12)	29.00	4.39	0.57	0.03
		WQR-E-NET	1.46(2.31)	24.50	4.21	0.42	0.04
		WQR-ALASSO	0.24(4.81)	40.00	4.90	0.77	0.02
		WQR-AE-NET	0.67(4.38)	44.00	4.86	0.68	0.04
	$σ = 1, τ = 0.50$	QR-LASSO	−0.01(1.19)	62.50	4.48	0.00	0.02
		QR-E-NET	−0.04(1.39)	12.50	3.45	0.00	0.02
		QR-ALASSO	0.02(1.15)	99.50	5.00	0.00	0.02
		QR-AE-NET	−0.01(1.43)	97.50	4.98	0.00	0.03
	$σ = 3, τ = 0.50$	QR-LASSO	−0.08(3.61)	53.00	4.54	0.30	0.02
		QR-E-NET	−0.10(4.02)	11.00	3.92	0.54	0.01
		QR-ALASSO	0.06(3.55)	74.00	4.80	0.10	0.00
		QR-AE-NET	0.05(3.96)	18.50	4.17	0.35	0.00
	$σ = 1, τ = 0.50$	WQR-LASSO	−0.01(0.67)	71.50	4.65	0.00	0.04
		WQR-E-NET	−0.01(0.77)	35.00	3.96	0.00	0.05
		WQR-ALASSO	0.01(0.78)	100.00	5.00	0.00	0.04
		WQR-AE-NET	0.01(0.81)	100.00	5.00	0.00	0.07
	$σ = 3, τ = 0.50$	WQR-LASSO	−0.05(2.27)	39.50	4.66	0.48	0.04
		WQR-E-NET	−0.05(2.30)	31.50	4.38	0.35	0.04
		WQR-ALASSO	−1.04(4.52)	51.00	4.96	0.72	0.03
		WQR-AE-NET	−0.94(4.15)	63.00	4.98	0.48	0.04
D5: N-distribution	$σ = 1, τ = 0.50$	QR-LASSO	−0.06(2.34)	54.50	4.98	0.80	0.00
		QR-E-NET	−0.02(4.67)	0.00	4.97	2.91	0.09
		QR-ALASSO	0.01(1.46)	95.50	5.00	0.07	0.00
		QR-AE-NET	0.00(4.66)	0.00	4.99	2.87	0.09
	$σ = 3, τ = 0.50$	QR-LASSO	0.08(5.40)	11.50	4.98	2.56	0.01
		QR-E-NET	−0.03(5.42)	0.00	4.96	2.92	0.09
		QR-ALASSO	0.07(4.74)	26.50	5.00	1.70	0.00
		QR-AE-NET	−0.02(5.47)	0.00	4.99	2.95	0.01
	$σ = 1, τ = 0.50$	WQR-LASSO	0.01(0.70)	68.50	4.61	0.00	0.04
		WQR-E-NET	0.01(0.69)	34.50	3.95	0.00	0.04
		WQR-ALASSO	0.01(0.72)	100.00	5.00	0.00	0.04
		WQR-AE-NET	0.00(0.73)	98.00	4.98	0.00	0.06
	$σ = 3, τ = 0.50$	WQR-LASSO	0.04(2.26)	42.00	4.72	0.47	0.04
		WQR-E-NET	0.03(2.26)	29.00	4.35	0.38	0.05
		WQR-ALASSO	0.00(2.28)	17.50	4.99	0.91	0.01
		WQR-AE-NET	−0.01(2.39)	13.00	5.00	0.93	0.02

Table 4. Penalized weighted and unweighted quantile regression at D2 for

n = 50

(

τ = 0.25

and

τ = 0.50

) under the heavy tailed t-distribution.

Table 4. Penalized weighted and unweighted quantile regression at D2 for

n = 50

(

τ = 0.25

and

τ = 0.50

) under the heavy tailed t-distribution.

Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	No of Zeros		Median( $λ$ )
Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	c.zero	inc.zero	Median( $λ$ )
D2: t-distribution	$σ = 1, d = 1, τ = 0.25$	QR-LASSO	2.49(4.51)	1.50	4.26	1.50	0.02
		QR-E-NET	3.26(5.05)	0.00	2.79	1.32	0.02
		QR-ALASSO	2.43(3.87)	24.00	4.93	1.52	0.01
		QR-AE-NET	2.71(4.29)	0.00	3.03	0.78	0.01
	$σ = 0.50, d = 1, τ = 0.25$	QR-LASSO	1.08(2.02)	15.50	4.05	0.54	0.02
		QR-E-NET	1.75(3.09)	0.00	1.30	0.20	0.02
		QR-ALASSO	1.16(2.16)	52.00	4.88	0.76	0.02
		QR-AE-NET	1.45(2.38)	0.50	3.27	0.32	0.03
	$σ = 1, d = 1, τ = 0.25$	WQR-LASSO	1.57(2.51)	21.50	4.85	1.65	0.04
		WQR-E-NET	1.61(2.60)	4.00	4.35	1.55	0.04
		WQR-ALASSO	1.49(2.63)	8.00	4.79	2.12	0.00
		WQR-AE-NET	1.60(2.71)	1.00	4.57	2.16	0.00
	$σ = 0.50, d = 1, τ = 0.25$	WQR-LASSO	0.81(1.41)	46.50	4.74	0.82	0.04
		WQR-E-NET	0.89(1.56)	5.00	3.77	0.74	0.04
		WQR-ALASSO	0.70(1.33)	55.00	5.00	0.95	0.02
		WQR-AE-NET	0.79(1.42)	51.00	4.90	0.88	0.03
	$σ = 1, d = 1, τ = 0.50$	QR-LASSO	0.00(3.64)	3.00	3.97	1.20	0.02
		QR-E-NET	−0.01(4.32)	0.00	2.67	0.92	0.02
		QR-ALASSO	0.08(3.26)	42.50	4.97	1.16	0.01
		QR-AE-NET	0.02(3.85)	0.50	3.34	0.74	0.01
	$σ = 0.50, d = 1, τ = 0.50$	QR-LASSO	−0.02(1.70)	30.00	4.37	0.53	0.03
		QR-E-NET	0.01(2.49)	0.00	1.64	0.27	0.02
		QR-ALASSO	0.04(1.64)	77.00	5.00	0.54	0.02
		QR-AE-NET	0.01(1.98)	2.00	3.34	0.33	0.02
	$σ = 1, d = 1, τ = 0.50$	WQR-LASSO	0.03(2.21)	29.50	4.73	1.14	0.04
		WQR-E-NET	0.02(2.30)	9.00	4.11	1.00	0.04
		WQR-ALASSO	0.02(2.21)	30.50	4.94	1.52	0.01
		WQR-AE-NET	0.05(2.30)	17.00	4.58	1.40	0.01
	$σ = 0.50, d = 1, τ = 0.50$	WQR-LASSO	0.01(1.15)	51.50	4.67	0.51	0.04
		WQR-E-NET	0.01(1.25)	7.00	3.51	0.46	0.04
		WQR-ALASSO	0.00(1.11)	67.00	5.00	0.66	0.04
		WQR-AE-NET	0.00(1.22)	69.50	4.98	0.59	0.07
D2: t-distribution	$σ = 1, d = 6, τ = 0.25$	QR-LASSO	0.82(1.30)	14.00	3.40	0.01	0.02
		QR-E-NET	1.18(1.80)	0.00	1.90	0.00	0.02
		QR-ALASSO	0.84(1.30)	90.00	4.90	0.01	0.02
		QR-AE-NET	1.04(1.53)	0.00	2.91	0.00	0.02
	$σ = 0.50, d = 6, τ = 0.25$	QR-LASSO	0.40(0.64)	16.00	3.51	0.01	0.02
		QR-E-NET	0.62(1.02)	0.00	1.90	0.00	0.02
		QR-ALASSO	0.38(0.63)	98.00	4.98	0.00	0.02
		QR-AE-NET	0.63(1.03)	0.00	2.28	0.00	0.02
	$σ = 1, d = 6, τ = 0.25$	WQR-LASSO	0.50(0.97)	42.00	4.10	0.00	0.04
		WQR-E-NET	0.51(1.00)	1.50	2.72	0.00	0.04
		WQR-ALASSO	0.55(0.91)	92.00	4.92	0.01	0.03
		WQR-AE-NET	0.60(0.97)	56.50	4.45	0.00	0.04
	$σ = 0.50, d = 6, τ = 0.25$	WQR-LASSO	0.26(0.49)	31.00	3.92	0.00	0.04
		WQR-E-NET	0.27(0.51)	2.50	2.62	0.00	0.04
		WQR-ALASSO	0.28(0.45)	98.50	4.99	0.00	0.04
		WQR-AE-NET	0.30(0.49)	83.50	4.82	0.00	0.05

Table 5. Penalized weighted and unweighted quantile regression at D3 for

n = 50

(

τ = 0.25

and

τ = 0.50

) under the heavy tailed t-distribution; bold text indicate superior performance.

Table 5. Penalized weighted and unweighted quantile regression at D3 for

n = 50

(

τ = 0.25

and

τ = 0.50

) under the heavy tailed t-distribution; bold text indicate superior performance.

Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	No of Zeros		Median( $λ$ )
Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	c.zero	inc.zero	Median( $λ$ )
D3: t-distribution	$σ = 1, d = 1, τ = 0.25$	QR-LASSO	1.32(2.39)	17.00	3.84	0.43	0.03
		QR-E-NET	2.07(3.49)	0.00	1.77	0.30	0.02
		QR-ALASSO	1.84(3.21)	44.50	4.91	0.74	0.01
		QR-AE-NET	2.23(3.39)	2.50	3.69	0.29	0.02
	$σ = 0.50, d = 1, τ = 0.25$	QR-LASSO	0.77(1.28)	56.00	4.55	0.25	0.03
		QR-E-NET	1.75(2.65)	0.00	2.17	0.20	0.02
		QR-ALASSO	0.90(1.56)	84.50	4.99	0.27	0.03
		QR-AE-NET	1.44(2.04)	2.00	4.01	0.18	0.04
	$σ = 1, d = 1, τ = 0.25$	WQR-LASSO	1.50(2.62)	37.00	4.96	1.51	0.04
		WQR-E-NET	1.61(2.78)	17.00	4.45	1.36	0.04
		WQR-ALASSO	1.57(2.57)	21.50	4.99	1.78	0.02
		WQR-AE-NET	1.62(2.58)	22.50	4.93	1.61	0.03
	$σ = 0.50, d = 1, τ = 0.25$	WQR-LASSO	0.67(1.30)	67.50	4.87	0.54	0.04
		WQR-E-NET	0.77(1.47)	24.00	3.91	0.47	0.04
		WQR-ALASSO	0.88(1.52)	55.50	4.99	0.91	0.03
		WQR-AE-NET	0.91(1.56)	55.50	4.88	0.77	0.04
	$σ = 1, d = 1, τ = 0.50$	QR-LASSO	−0.02(2.39)	28.50	4.30	0.46	0.06
		QR-E-NET	−0.07(3.56)	1.00	1.52	0.25	0.02
		QR-ALASSO	0.02(3.05)	27.50	4.81	0.86	0.01
		QR-AE-NET	0.04(3.43)	0.00	3.08	0.48	0.02
	$σ = 0.50, d = 1, τ = 0.50$	QR-LASSO	0.01(1.23)	63.00	4.71	0.31	0.04
		QR-E-NET	0.00(2.15)	0.00	2.08	0.21	0.02
		QR-ALASSO	0.01(1.47)	79.00	5.00	0.37	0.03
		QR-AE-NET	0.00(2.12)	3.50	3.81	0.24	0.03
	$σ = 1, d = 1, τ = 0.50$	WQR-LASSO	−0.04(2.30)	51.00	4.96	1.12	0.05
		WQR-E-NET	−0.03(2.48)	21.00	4.21	0.96	0.05
		WQR-ALASSO	0.02(2.29)	24.50	4.99	1.46	0.01
		WQR-AE-NET	0.05(2.36)	20.00	4.90	1.31	0.02
	$σ = 0.50, d = 1, τ = 0.50$	WQR-LASSO	−0.01(1.17)	69.50	4.88	0.49	0.04
		WQR-E-NET	−0.01(1.29)	10.50	3.36	0.40	0.04
		WQR-ALASSO	0.01(1.16)	69.50	5.00	0.61	0.03
		WQR-AE-NET	0.02(1.26)	73.50	4.98	0.50	0.05
D3: t-distribution	$σ = 1, d = 6, τ = 0.50$	QR-LASSO	0.02(1.26)	2.50	2.67	0.01	0.02
		QR-E-NET	0.00(2.48)	0.00	0.02	0.02	0.02
		QR-ALASSO	0.03(1.22)	98.50	4.99	0.00	0.04
		QR-AE-NET	0.01(1.77)	0.00	3.36	0.00	0.04
	$σ = 0.50, d = 6, τ = 0.50$	QR-LASSO	−0.02(0.64)	1.50	2.62	0.0	0.01
		QR-E-NET	−0.07(2.03)	0.00	0.01	0.00	0.02
		QR-ALASSO	0.01(0.61)	100.00	5.00	0.00	0.05
		QR-AE-NET	0.00(1.11)	0.00	4.00	0.00	0.05
	$σ = 1, d = 6, τ = 0.50$	WQR-LASSO	−0.01(0.69)	53.50	4.40	0.00	0.05
		WQR-E-NET	−0.01(0.73)	7.00	2.84	0.00	0.04
		WQR-ALASSO	0.01(0.84)	98.50	4.99	0.00	0.04
		WQR-AE-NET	0.01(0.88)	81.00	4.81	0.00	0.07
	$σ = 0.50, d = 6, τ = 0.50$	WQR-LASSO	0.00(0.33)	47.50	4.26	0.00	0.05
		WQR-E-NET	0.00(0.36)	4.00	2.84	0.00	0.05
		WQR-ALASSO	0.01(0.42)	99.50	5.00	0.00	0.05
		WQR-AE-NET	0.01(0.43)	93.00	4.93	0.00	0.08

Table 6. Penalized weighted and unweighted quantile regression at D6 under the heavy-tailed t-distribution for

n = 50

(

τ = 0.25

and

τ = 0.50

); bold text indicate superior performance.

Table 6. Penalized weighted and unweighted quantile regression at D6 under the heavy-tailed t-distribution for

n = 50

(

τ = 0.25

and

τ = 0.50

); bold text indicate superior performance.

Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	No of Zeros		Median( $λ$ )
Distribution	Parameter	Method	Median(MAD) Test Error	Correctly Fitted	c.zero	inc.zero	Median( $λ$ )
D6: t-distribution	$σ = 1, d = 1, τ = 0.25$	QR-LASSO	2.54(4.43)	22.50	4.71	1.34	0.04
		QR-E-NET	2.79(4.69)	3.00	3.06	0.96	0.04
		QR-ALASSO	2.43(4.26)	19.50	4.74	1.19	0.02
		QR-AE-NET	2.71(4.66)	0.00	3.82	1.07	0.02
	$σ = 0.50, d = 1, τ = 0.25$	QR-LASSO	1.10(1.95)	25.50	4.15	0.52	0.03
		QR-E-NET	1.34(2.37)	0.00	2.36	0.44	0.03
		QR-ALASSO	1.15(2.14)	37.00	4.73	0.77	0.03
		QR-AE-NET	1.27(2.39)	8.00	4.22	0.45	0.04
	$σ = 1, d = 1, τ = 0.25$	WQR-LASSO	1.63(3.05)	14.00	4.66	1.52	0.04
		WQR-E-NET	1.82(3.22)	1.00	3.61	1.07	0.04
		WQR-ALASSO	1.60(2.86)	14.00	4.94	1.95	0.02
		WQR-AE-NET	1.75(3.06)	11.50	4.80	1.79	0.04
	$σ = 0.50, d = 1, τ = 0.25$	WQR-LASSO	1.08(1.99)	67.00	4.80	0.38	0.04
		WQR-E-NET	1.27(2.23)	1.00	2.45	0.32	0.03
		WQR-ALASSO	0.80(1.43)	10.00	3.95	0.69	0.00
		WQR-AE-NET	0.96(1.71)	0.50	2.79	0.66	0.00
	$σ = 1, d = τ = 0.50$	QR-LASSO	−0.06(3.48)	37.00	4.69	0.90	0.04
		QR-E-NET	−0.06(3.72)	1.50	2.67	0.76	0.04
		QR-ALASSO	−0.02(3.45)	41.50	4.91	0.87	0.02
		QR-AE-NET	0.06(3.82)	2.00	3.99	0.58	0.03
	$σ = 0.50, d = 1, τ = 0.50$	QR-LASSO	−0.04(1.62)	33.00	4.30	0.36	0.04
		QR-E-NET	0.00(1.83)	0.00	2.45	0.33	0.03
		QR-ALASSO	−0.01(1.81)	64.50	4.96	0.63	0.04
		QR-AE-NET	0.01(1.99)	16.00	4.23	0.31	0.06
	$σ = 1, d = 1, τ = 0.50$	WQR-LASSO	0.04(2.47)	26.00	4.65	1.09	0.04
		WQR-E-NET	0.05(2.74)	2.50	3.57	0.81	0.04
		WQR-ALASSO	−0.01(2.36)	43.50	4.91	1.19	0.00
		WQR-AE-NET	0.00(2.64)	3.00	4.45	1.26	0.03
	$σ = 0.50, d = 1, τ = 0.50$	WQR-LASSO	0.02(1.52)	58.50	4.67	0.33	0.04
		WQR-E-NET	0.03(1.78)	0.00	2.02	0.29	0.04
		WQR-ALASSO	−0.01(1.19)	69.00	4.86	0.43	0.00
		WQR-AE-NET	0.00(1.41)	7.50	3.21	0.40	0.00
D6: t-distribution	$σ = 1, d = 6, τ = 0.25$	QR-LASSO	0.82(1.25)	53.50	4.26	0.00	0.04
		QR-E-NET	0.86(1.35)	3.00	2.44	0.00	0.03
		QR-ALASSO	0.81(1.25)	96.00	4.96	0.00	0.03
		QR-AE-NET	0.88(1.29)	38.50	4.22	0.00	0.04
	$σ = 0.50, d = 6, τ = 0.25$	QR-LASSO	0.40(0.66)	63.50	4.51	0.00	0.04
		QR-E-NET	0.41(0.72)	2.50	2.77	0.00	0.04
		QR-ALASSO	0.38(0.62)	97.50	4.97	0.00	0.03
		QR-AE-NET	0.40(0.66)	26.00	4.15	0.00	0.05
	$σ = 1, d = 6, τ = 0.25$	WQR-LASSO	0.54(0.92)	43.00	4.23	0.00	0.04
		WQR-E-NET	0.55(0.96)	2.00	3.07	0.00	0.04
		WQR-ALASSO	0.37(0.69)	98.00	4.98	0.00	0.03
		WQR-AE-NET	0.40(0.73)	59.00	4.52	0.00	0.05
	$σ = 0.50, d = 6, τ = 0.25$	WQR-LASSO	0.26(0.47)	35.50	4.06	0.00	0.04
		WQR-E-NET	0.25(0.51)	5.00	3.16	0.00	0.03
		WQR-ALASSO	0.25(0.42)	99.50	5.00	0.00	0.04
		WQR-AE-NET	0.27(0.43)	83.50	4.84	0.00	0.06

Table 7. Penalized

Q R

parameter estimation results for the Jet-Turbine Engine data set [32]; bold text indicate superior performance.

Table 7. Penalized

Q R

parameter estimation results for the Jet-Turbine Engine data set [32]; bold text indicate superior performance.

		$β$	NON-BIASED	QR-LASSO	QR-E-NET	QR-ALASSO	QR-AE-NET
		$β$	$β (Bias)$	$β (Bias)$	$β (Bias)$	$β (Bias)$	$β (Bias)$
$τ = 0.25$	$λ$			0.01	0.01	0.07	0.02
	intercept	−0.72	−1.11(0.39)	−0.60(−0.12)	−0.69(−0.03)	−0.72(0.00)	−0.67(−0.05)
	$X_{1}$	50.00	15.15(−34.85)	33.20(−16.80)	21.63(−28.37)	17.55(−32.45)	24.53(−25.47)
	$X_{2}$	0.00	84.78(84.78)	10.34(10.34)	14.14(14.14)	26.40(26.40)	27.17(27.17)
	$X_{3}$	0.00	−89.20(−89.20)	0.00(0.00)	0.00(0.00)	−20.53(−20.53)	−26.10(−26.10)
	$X_{4}$	10.00	28.94(18.94)	14.18(4.18)	19.48(9.48)	27.98(17.98)	25.27(15.27)
	$X_{5}$	15.00	30.54(15.54)	16.51(1.51)	18.57(3.57)	22.19(7.19)	22.54(7.54)
	$X_{6}$	0.00	10.21(10.21)	−4.28(−4.28)	−4.75(−4.75)	−0.78(−0.78)	0.57(0.57)
$τ = 0.50$	$λ$			0.08	0.03	0.32	0.10
	intercept	0.00	0.40(0.40)	0.44(0.44)	0.39(0.39)	0.57(0.57)	0.53(0.53)
	$X_{1}$	50.00	38.47(−11.53)	9.68(−40.32)	16.28(−33.72)	27.98(−22.02)	16.09(−33.91)
	$X_{2}$	0.00	41.62(41.62)	33.30(33.30)	17.72(17.72)	35.97(35.97)	21.51(21.51)
	$X_{3}$	0.00	−37.77(−37.77)	0.00(0.00)	8.14(8.14)	0.00(0.00)	0.00(0.00)
	$X_{4}$	10.00	12.73(2.73)	21.57(11.57)	18.00(8.00)	5.43(−4.57)	21.60(11.60)
	$X_{5}$	15.00	19.06(4.06)	6.02(−8.98)	12.42(−2.58)	1.56(−13.44)	13.52(−1.48)
	$X_{6}$	0.00	4.21(4.21)	0.00(0.00)	−3.06(−3.06)	0.00(0.00)	−0.90(−0.90)
				WQR-LASSO	WQR-E-NET	WQR-ALASSO	WQR-AE-NET
$τ = 0.25$	$λ$			0.01	0.01	0.00	0.00
	intercept	−0.72	−1.11(0.39)	−0.10(−0.62)	−0.08(−0.64)	−0.07(−0.65)	−0.23(−0.49)
	$X_{1}$	50.00	15.15(−34.85)	35.34(−14.66)	26.98(−23.02)	44.01(−5.99)	51.84(1.84)
	$X_{2}$	0.00	84.78(84.78)	0.00(0.00)	0.00(0.00)	0.00(0.00)	−15.12(−15.12)
	$X_{3}$	0.00	−89.20(−89.20)	0.00(0.00)	0.25(0.25)	0.00(0.00)	−14.00(−14.00)
	$X_{4}$	10.00	28.94(18.94)	16.13(6.13)	22.52(12.52)	10.23(0.23)	20.28(10.28)
	$X_{5}$	15.00	30.54(15.54)	25.85(10.85)	27.79(12.79)	23.02(8.02)	35.00(20.00)
	$X_{6}$	0.00	10.21(10.21)	−9.56(−9.56)	−10.65(−10.65)	−8.81(−8.81)	−7.73(−7.73)
$τ = 0.50$	$λ$			0.04	0.00	0.97	0.42
	intercept	0.00	0.40(0.40)	0.01(0.01)	0.02(0.02)	0.01(0.01)	0.01(0.01)
	$X_{1}$	50.00	38.47(−11.53)	49.19(−0.81)	41.15(−8.85)	54.16(4.16)	20.42(−29.58)
	$X_{2}$	0.00	41.62(41.62)	0.00(0.00)	0.00(0.00)	0.00(0.00)	10.97(10.97)
	$X_{3}$	0.00	−37.77(−37.77)	0.00(0.00)	−5.85(−5.85)	0.00(0.00)	6.34(6.34)
	$X_{4}$	10.00	12.73(2.73)	0.00(−10.00)	18.47(8.47)	0.80(−9.20)	17.20(7.20)
	$X_{5}$	15.00	19.06(4.06)	14.63(−0.37)	21.20(6.20)	18.96(3.96)	18.26(3.26)
	$X_{6}$	0.00	4.21(4.21)	0.00(0.00)	4.58(4.58)	0.00(0.00)	0.00(0.00)

Note that these models with no constant term, the intercept F⁻¹ + β₀, translate to 0 and −0.72 under the t6 error term distribution, at quantile levels τ = 0.50 and τ = 0.25, respectively.

Table 8. Penalized

Q R

parameter estimation results for the Gunst and Mason data set; bold text indicate superior performance.

Table 8. Penalized

Q R

parameter estimation results for the Gunst and Mason data set; bold text indicate superior performance.

		$β$	NON-BIASED	QR-LASSO	QR-E-NET	QR-ALASSO	QR-AE-NET
		$β$	$β (Bias)$	$β (Bias)$	$β (Bias)$	$β (Bias)$	$β (Bias)$
$τ = 0.25$	$λ$			0.00	0.00	0.01	0.00
	intercept	−0.72	0.38(−1.10)	−0.85(0.13)	−0.85(0.13)	−0.98(0.26)	−0.69(0.03)
	$X_{1}$	0.00	−0.14(0.14)	−0.02(0.02)	0.00(0.00)	0.00(0.00)	−0.19(0.19)
	$X_{2}$	8.00	9.29(−1.29)	4.37(3.63)	4.89(3.11)	2.97(5.03)	7.73(0.27)
	$X_{3}$	−13.00	−10.97(−2.03)	0.00(−13.00)	−2.00(−11.00)	−6.35(−6.65)	−9.59(−3.41)
	$X_{4}$	0.00	11.54(−11.54)	0.87(−0.87)	3.01(−3.01)	6.68(−6.68)	10.09(−10.09)
	$X_{5}$	0.00	6.21(−6.21)	1.62(−1.62)	2.09(−2.09)	0.00(0.00)	4.90(−4.90)
	$X_{6}$	6.00	1.61(4.39)	2.22(3.78)	2.19(3.81)	3.30(2.70)	1.74(4.26)
$τ = 0.50$	$λ$			0.01	0.02	0.01	0.01
	intercept	0.00	0.44(−0.44)	0.03(−0.03)	0.03(−0.03)	0.14(−0.14)	0.07(-0.07)
	$X_{1}$	0.00	0.72(−0.72)	0.40(−0.40)	0.41(−0.41)	0.00(0.00)	0.00(0.00)
	$X_{2}$	8.00	9.87(−1.87)	10.13(−2.13)	10.13(−2.13)	7.83(0.17)	8.21(−0.21)
	$X_{3}$	−13.00	−5.15(−7.85)	−4.80(−8.20)	−4.80(−8.20)	−0.98(−12.02)	−8.41(−4.59)
	$X_{4}$	0.00	4.68(−4.68)	4.38(−4.38)	4.38(−4.38)	0.00(0.00)	6.89(−6.89)
	$X_{5}$	0.00	3.64(−3.64)	3.71(−3.71)	3.71(−3.71)	0.00(0.00)	0.81(−0.81)
	$X_{6}$	6.00	4.94(1.06)	4.92(1.08)	4.91(1.09)	6.24(−0.24)	4.31(1.69)
				WQR-LASSO	WQR-E-NET	WQR-ALASSO	WQR-AE-NET
$τ = 0.25$	$λ$			0.04	0.04	0.05	0.06
	intercept	−0.72	0.38(−1.10)	−0.16(−0.56)	−0.10(−0.62)	−0.11(−0.61)	−0.11(−0.61)
	$X_{1}$	0.00	−0.14(0.14)	−2.31(2.31)	−2.44(2.44)	0.00(0.00)	0.00(0.00)
	$X_{2}$	8.00	9.29(−1.29)	8.33(−0.33)	9.01(−1.01)	6.43(1.57)	6.44(1.56)
	$X_{3}$	−13.00	−10.97(−2.03)	0.00(−13.00)	0.00(−13.00)	−0.50(−12.50)	−0.52(−12.48)
	$X_{4}$	0.00	11.54(−11.54)	0.00(0.00)	0.00(0.00)	0.00(0.00)	0.00(0.00)
	$X_{5}$	0.00	6.21(−6.21)	0.00(0.00)	0.00(0.00)	0.00(0.00)	0.00(0.00)
	$X_{6}$	6.00	1.61(4.39)	9.03(−3.03)	9.33(−3.33)	8.42(−2.42)	8.42(−2.42)
$τ = 0.50$	$λ$			0.04	0.04	0.01	0.03
	intercept	0.00	0.44(−0.44)	−0.01(0.01)	0.00(0.00)	0.00(0.00)	−0.01(0.01)
	$X_{1}$	0.00	0.72(−0.72)	0.00(0.00)	0.00(0.00)	0.00(0.00)	0.00(0.00)
	$X_{2}$	8.00	9.87(−1.87)	7.48(0.52)	6.27(1.73)	8.37(−0.37)	8.44(−0.44)
	$X_{3}$	−13.00	−5.15(−7.85)	−5.75(−7.25)	−5.99(−7.01)	−9.29(−3.71)	−9.67(−3.33)
	$X_{4}$	0.00	4.68(−4.68)	0.00(0.00)	0.00(0.00)	0.00(0.00)	0.00(0.00)
	$X_{5}$	0.00	3.64(−3.64)	−0.18(0.18)	−1.19(1.19)	0.00(0.00)	0.00(0.00)
	$X_{6}$	6.00	4.94(1.06)	10.68(−4.68)	11.21(−5.21)	11.38(−5.38)	11.32(−5.32)

Note that these models with no constant term, the intercept F⁻¹ + β₀, translate to 0 and −0.72 under the t6 error term distribution, at quantile levels τ = 0.50 and τ = 0.25, respectively.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mudhombo, I.; Ranganai, E. Robust Variable Selection and Regularization in Quantile Regression Based on Adaptive-LASSO and Adaptive E-NET. Computation 2022, 10, 203. https://doi.org/10.3390/computation10110203

AMA Style

Mudhombo I, Ranganai E. Robust Variable Selection and Regularization in Quantile Regression Based on Adaptive-LASSO and Adaptive E-NET. Computation. 2022; 10(11):203. https://doi.org/10.3390/computation10110203

Chicago/Turabian Style

Mudhombo, Innocent, and Edmore Ranganai. 2022. "Robust Variable Selection and Regularization in Quantile Regression Based on Adaptive-LASSO and Adaptive E-NET" Computation 10, no. 11: 203. https://doi.org/10.3390/computation10110203

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Variable Selection and Regularization in Quantile Regression Based on Adaptive-LASSO and Adaptive E-NET

Abstract

1. Introduction

2. Quantile Regression

2.1. Variable Selection in Quantile Regression

2.1.1. Choice of Adaptive Weights for $A L A S S O$

2.2. Adaptive Penalized Weighted Quantile Regression

2.3. Asymptotic Properties

3. Simulation Study

3.1. Simulation Design Scenarios

3.2. Results

3.2.1. D1 under the t-Distribution

3.2.2. D2 and D4 under Normal Distribution

3.2.3. D3 and D5 under the Normal Distribution

3.2.4. D2 and D3 under the t-Distribution

3.2.5. D6 under the t-Distribution

3.3. Examples

3.4. The Jet-Turbine Engine Data

3.5. The Gunst and Mason Data

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Robust Variable Selection and Regularization in Quantile Regression Based on Adaptive-LASSO and Adaptive E-NET

Abstract

1. Introduction

2. Quantile Regression

2.1. Variable Selection in Quantile Regression

2.1.1. Choice of Adaptive Weights for A L A S S O

2.2. Adaptive Penalized Weighted Quantile Regression

2.3. Asymptotic Properties

3. Simulation Study

3.1. Simulation Design Scenarios

3.2. Results

3.2.1. D1 under the t-Distribution

3.2.2. D2 and D4 under Normal Distribution

3.2.3. D3 and D5 under the Normal Distribution

3.2.4. D2 and D3 under the t-Distribution

3.2.5. D6 under the t-Distribution

3.3. Examples

3.4. The Jet-Turbine Engine Data

3.5. The Gunst and Mason Data

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.1.1. Choice of Adaptive Weights for $A L A S S O$