|
|
|
|||
|
Application of Mathematical Programming Techniques in Credit Scoring of Agricultural Loans Houshmand A. Ziari, David J. Leatham, and Calum G. Turvey Houshmand A. Ziari is a visiting assistant professor and David J. Leatham is an associate professor of agricultural economics at Texas A&M University, College Station, Texas. Calum G. Turvey is an associate professor of agricultural economics and business, University of Guelph, Ontario, Canada. Statistical discriminant analysis methods have long been the standard for dealing with classification problems. In recent years, much theoretical research has focused on the application of mathematical programming (MP) techniques to the discriminant problem (e.g., credit scoring). In many experiments with simulated data, researchers have shown that MP techniques rival or outperform statistical discriminant techniques. However, no extensive study has been conducted which compares the performance of alternative MP techniques using real-world data. This article evaluates the classification performance of alternative MP techniques for screening loan applications using logit and Fisher's linear discriminant function models as a performance benchmark. The results show that the MP models perform as well as statistical models. In some cases, one variant of MP model marginally outperforms the statistical models. Key words: credit scoring, mathematical programming, logit, discriminant analysis. Article <top> In classification problems, discriminant analysis (DA) methods are used to classify an individual or object, based on a set of discriminatory variables or attributes, into one of a number of mutually exclusive groups. DA has emerged as an important decision-making tool in many fields and is used extensively in business, biology, the social sciences, and other areas that require classification processes. It has been widely applied in business fields such as credit scoring (Srinivasan and Kim; Turvey), bankruptcy assessment (Mahmood and Lawrence), prediction of various events including credit card usage and tender offer outcomes, and personnel evaluation or selection of employees (Eisenbeis). DA methods include parametric (statistical) discriminant models (e.g., logit, probit, linear discriminant, and quadratic discriminant models) and nonparametric discriminant models (e.g., mathematical programming and neural networks). Historically, statistical DA methods have been the standard for dealing with classification problems. However, in recent years, many researchers have expressed concern about certain features of statistical DA models. In particular, statistical DA methods require restrictive assumptions of distributional form. For example, Fisher's linear discriminant model, which perhaps is the most widely used DA method, requires the assumption of multivariate normal populations with the same variance/covariance structure. Unfortunately, violations of these assumptions occur regularly. Eisenbeis argues that deviations from the normality assumptions, at least in economics and finance, are more likely the rule rather than the exception (p. 875). For example, the financial ratios normally used in credit scoring are rarely normally distributed. In addition, most empirical data include qualitative variables that cannot be multivariate normal (Goldstein and Dillon). Finally, the classification performance of statistical DA models may be adversely affected when underlying parametric assumptions are violated (Baladrishnan and Subrahmanian; Lachenbruch, Sneeinger, and Revo; Press and Wilson). The statistical DA models also assume that misclassification costs are the same for all groups (type I and type II errors have equal significance). For example, the cost of turning down a good loan (type I error) and the cost of accepting a bad loan (type II error) are assumed to be the same. Furthermore, statistical DA models are not apt to adequately handle a complex discrimination problem. In certain situations, a side constraint might be necessary which would prohibit the use of statistical DA models. These shortcomings of statistical DA models have prompted researchers to explore the use of several nonparametric DA techniques such as neural network, mathematical programming, and search methods. This article focuses exclusively on the mathematical programming (MP) DA techniques. In recent years, considerable theoretical research has been devoted to the use of MP techniques for solving classification problems. Freed and Glover (1981a, b) and Hand were the first to introduce the use of MP in DA. Glover, Keene, and Duea argue that the MP approach to DA offers certain advantages over the statistical DA models. These include:
1For a linear model, Y = bX + e, L1 and L2 represent special cases of the Lp estimator that minimize Σi ô Yi Xi_β ô p. The solutions to this problem, when the p-values are equal to 1 and 2, can be derived by using L1 and L2 estimators, respectively. Examples of L1 estimators are the least absolute value estimator and the least absolute residual estimator. The values for L1 metric estimators are obtained by minimizing the sum of the absolute value of residuals. The minimization problem for L1 can be cast as a linear programming problem. An example of an L2 estimator is ordinary least squares. The values for L2 estimators are obtained by minimizing the sum of the squared residuals. Since the residuals are squared, very large residuals or outliers may have a significant influence on the parameters obtained using L2 estimators. In several experiments utilizing Monte Carlo simulation data, researchers have found that some MP techniques rival or outperform the statistical DA techniques in terms of the relative classification performance (Bajgier and Hill; Freed and Glover 1986; Joachimsthaler and Stam; Rubin). This is especially true when the underlying assumptions of statistical DA models are not satisfied. In spite of these experimental results, our review of the literature found no extensive study that compares the performance of alternative MP models using real-world data. Only three studies (Hardy and Adrian; Mahmood and Lawrence; and Srinivasan and Kim) have applied MP discriminant models to actual business data and compared the classification performance of statistical models with MP models. However, these works only compared the statistical models to one variant of MP techniques. Further, the MP models used were a rudimentary form of general MP models that have been found to perform poorly in practice (Bajgier and Hill; Markowski and Markowski 1987). Moreover, none of the studies attempted to take advantage of the inherent flexibility of MP models. The purpose of this article is to present and compare alternative MP formulations and apply them to actual business data. Specifically, the objective of this article is to evaluate alternative MP techniques in credit scoring of agricultural loans using statistical DA models, namely Fisher's linear discriminant model (FLDM) and the logit discriminant model (LDM), as a performance benchmark. The MP and statistical DA models are compared on the basis of classification ability on in-sample and hold-out sample data sets. The remainder of the article is organized into the following major sections. First, a two-group discrimination problem is discussed. Next, a brief discussion of statistical DA models is provided, followed by a presentation of MP discriminant models. Data and variable selection are then reported. Finally, we compare the classification performance of statistical and MP models, and then offer our conclusions. Two-Group Discrimination Problem <top> The two-group discriminant problem deals with discrimination between two predefined groups and is the fundamental problem addressed by DA. A two-group discriminant problem assumes that there are two well-defined populations, G1 and G2 (e.g., good loans vs. bad loans), and it is possible to measure j discriminatory variables or attributes for each member of either population. The focus of DA is the determination of a numerical rule or discriminant function that allows the investigator to distinguish between two populations using the j attributes. A linear discriminant function can be expressed by
where X0 is a constant term, Xj is the weight assigned to variable j, Bi j is the value of the jth variable for the ith individual, and Zi is the discriminate value for the ith individual. For a given cutoff or boundary value of b, the classification rule then becomes: If Zi Ž b, then individual i is assigned to group G1; otherwise, individual i is assigned to group G2. The cutoff value does not have to be the same for both groups. But, for simplicity here, we assume the cutoff values for both groups are the same. The goal of any DA model is to estimate parameters X and b so as to minimize the number of misclassifications for in-sample and/or hold-out sample data sets. DA models are inherently different from each other according to their choice of criterion function and/or distribution assumption(s). However, in all DA models, X and b are determined from a set of observations for which their group membership is known. Statistical Linear Discriminant Analysis <top> An extensive body of literature exists in which the statistical DA models are discussed. Interested readers are referred to Altman, Avery, Eisenbeis, and Sinky, and to Maddala for a detailed discussion of statistical models in classification studies. For a discussion of credit scoring models and the theoretical consideration of credit scoring in agriculture, the interested reader is referred to Betubiza and Leatham; Miller and LaDue; Turvey; Chhikara; and Turvey and Brown. The statistical procedures of FLDM and LDM have been discussed extensively in the literature, and their detailed formulations are not repeated here. Fisher Linear Discriminant Model <top> The FLDM procedure computes the linear discriminant function (1) by maximizing the ratio of the between-group variance to the within-group variance. The derived linear discriminant function is known to be optimal in the context of minimizing the total probability of misclassifications, provided the following conditions are held: (a) the distributions of the variables are multivariate normal, and (b) the variances-covariances of the variables are the same for both population groups (Johnson and Wichern). The coefficients for the FLDM are estimated by
where Sg and μg are the variance-covariance matrix and mean vectors for group g (g = 1, 2), respectively, and ng is the number of observations in group g. The cutoff value for the FLDM is calculated by b = ln(c1 p/c2(1 p)), where c1 and c2 represent the misclassification costs for groups 1 and 2, and p is the prior probability that the individual comes from group 1. The cutoff value for an FLDM is equal to zero if the prior probability of group membership and the misclassification costs are the same. Logit Discriminant Model <top> Some statistical DA models, such as the LDM and probit, define the discriminator value Zi as a probability. The LDM assumes a logistic distribution function to represent the probability that an individual i belongs to group g:
where F (Zi ) converts the value of Zi to a probability value. The maximum-likelihood technique generally is used to estimate the weights (Maddala). The selection of the cutoff value for the LDM is rather arbitrary. Typically, if the estimated probability is greater than 0.5, then the first alternative is selected (Amemiya). Mathematical Programming Discriminant Analysis Models <top> The MP approach to discriminant problems, like statistical DA models, attempts to construct a discriminant function or a separating hyperplane to classify an individual or an object into a prespecified group. For a two-group problem, the objective is to determine a weighting vector X and a scalar b, so that the model assigns as correctly as possible the individuals of group 1 to one side of the separation hyperplane and the individuals of group 2 to the other side. Stated mathematically, the objective of an MP model is to find b and nonzero X, satisfying:
and
where Ag is an ng ´ j matrix of observations, and i = 1, 2, º , N, where N is the total number of observations (N = n1 + n2 ). The separating hyperplane, AX = b, provides the boundary between two groups. When the two groups are not linearly separable, a criterion is needed to separate the group classifications. Then the MP formulation of a discriminant problem can be cast as:
s.t.:
where F (X, b ) is the criterion function. The objective of this problem is to determine X and b that will optimize a certain criterion function. To develop the criterion function, deviation variables can be incorporated into (7) and (8):
s.t.:
where Eg and Ig are deviation variables. (Glover et al. label them external and internal deviations, respectively.) A deviation is said to be external/internal if its associated observation is incorrectly/correctly classified (i.e., falls on the wrong/right side of the separating hyperplane). External/internal deviations represent the extent to which an observation is incorrectly/correctly classified. Thus, external deviations are undesirable, while internal deviations are desirable. The above problem can be modified easily to handle multi-group classifications, as demonstrated by Freed and Glover (1981b) and by Gehrlein. Researchers have recently developed assorted MP models to deal with classification problems depending on the choice of a criterion function. Among the MP models are the minimize the sum of distances (MSD), the minimize the maximum distance (MMD), the mixed-integer (MIP), and the general Lp distance approaches. Various combinations of these basic methods have been proposed in the literature. Koehler and Erenguc (1990b) provide a comprehensive survey of various MP model formulations. As noted earlier, some of these models have proved to yield promising predictive power in studies using simulated data (Bajgier and Hill; Freed and Glover 1986; Joachimsthaler and Stam; Rubin) and also those using real data (Mahmood and Lawrence; Srinivasan and Kim). In the last few years, research has identified certain MP discriminant models that, under certain data conditions, exhibit pathological problems not found in the applications of MP in other fields. Glover et al. classified these problems under the headings of degeneracy and stability. The solution to MP is said to be degenerate, or unacceptable, if X = 0. The solution is unacceptable since all observations will be assigned to one group. The resultant discriminant functions lack any discriminatory power. The stability problem is referred to as a situation where the solutions are not invariant to linear data translation and transformation. For a theoretical discussion of these problems, see Koehler (p. 19, 89b); Markowski and Markowski (1985); Freed and Glover (1986a); and Glover et al. Early MP models constrained b to be a constant to avoid the unacceptable solutions. It was tacitly assumed that choice of b would just scale the solutions. Further research determined that this is not the case, however, and choice of b still leads to X = 0 for certain data configurations (Glover). Recently, several normalization alternatives have been suggested to overcome these anomalies. Details regarding alternative normalizations can be found in Koehler (1990). Since it is possible that normalization eliminates a feasible region with potential optimal solutions, a user should be cautious when employing normalization. To this end, Rubin recommends:
In this study, normalization was incorporated into the MP models if it was deemed to be necessary. In the remainder of this section, we present four variants of MP discriminant models. These models are chosen among alternative MP models based on their competitive classification power and also their appropriateness for dealing with the credit scoring problem. The first MP model, hereafter referred to as the MSD model, can be summarized as follows:
s.t.:
where eg and d are 1 ´ ng and 1 ´ j matrices of ones, respectively, and Eg has dimension ng ´ 1. X and b are unrestricted in signs. The MSD model minimizes the sum of exterior deviations from the hyperplane. Equation (17), a normalization constraint suggested by Freed and Glover (1986b), is included to overcome the difficulties associated with unacceptable solutions. The normalization constraint requires the sum of all coefficients to be equal to some arbitrary (positive) constant (1 is used here). The constant term is only a scaling constant and does not affect the classification rates. The MSD model, without normalization constraint (17), was originally reported by Freed and Glover (1981b). The second MP model used in this study, denoted the optimize sum of distances (OSD) by Bajgier and Hill, has the following form:
s.t.:
The MSD model is similar to the OSD. Both models attempt to minimize the sum of external deviations from the hyperplane. But, in the OSD model, the cutoff value is preassigned to be equal to an arbitrary number (1 is used here), which precludes the need for a normalization constraint. The third MP model, hereafter referred to as the hybrid model (HB), seeks to:
s.t.:
where hg and mg are 1 * ng matrix of nonnegative objective coefficients. The objective function of the HB model maximizes the weighted sum of interior deviations and minimizes the weighted sum of exterior deviations. Constraint (28) is included in the model formulation to prevent potential unbounded solutions. In practice, hg and mg may reflect the relative importance of incorrect/correct classification to a particular group or individuals in the group. By modifying these weights and parameters, usually by LP post-optimization techniques as proposed by Glover, the solution might be tailored to meet a decision maker's specific goals. In other words, it might be possible to find a set of weights that achieves balancing of errors for a decision maker's particular set of data (Markowski). The HB model was first presented by Glover et al.; they identified it as a hybrid model because it can encompass several variations of MP models by setting the corresponding weights equal to either +¥ or ¥ . However, the HB model presented here is different from the model presented by Glover et al. For simplicity, the maximum exterior deviation and the minimum interior deviation were deleted from our model formulation. The final MP variant used in this study is a mixed-integer programming model (MIP). The MIP model has the form:
s.t.:
where hg denotes the misclassification costs associated with group g, Yi is a binary variable that equals one if individual i is misclassified and zero otherwise, and q is a large positive number. The objective function of the MIP model minimizes total misclassification costs. The interesting feature of the MIP model, as noted by Bajgier and Hill, is that it is the only model that directly attacks the goal of minimizing the number of misclassifications; all other DA models (including parametric and nonparametric models) use a surrogate criterion function to achieve the goal. If misclassification costs for both groups are the same, then MIP directly minimizes the number of misclassifications. Whereas, all other models minimize the amount or extent of misclassification from the hyperplane, which might not be intuitively appealing to the users. Another interesting feature of the MIP is that a constraint can be easily incorporated into the model to balance the number of misclassifications for each group. In spite of its potential, the MIP model has not been widely utilized by researchers and practitioners because of a large computational cost and lack of efficient software. Koehler and Erenguc (1990a) recently developed a special-purpose, mixed-integer algorithm which takes advantage of the problem's structure. Moreover, because of the recent decrease in computing costs and increase in computing power, some general-purpose, mixed-integer program packages can now be conveniently applied to solve larger MIP problems.2 2Readers interested in seeing a numerical presentation of mathematical programming of a discriminant analysis model can refer to Freed and Glover (1981b), and to Hardy and Adrian. Data and Variable Selection <top> To perform a comparative analysis, the above models were applied to estimate the corresponding discriminant functions using a sample of credit application data. The classification powers of these models were then tested based on their performance using in-sample and hold-out sample data. The credit application data used in this study were collected by Canada's Farm Credit Corporation. The data are from actual 1981, 1982, and 1983 loan applications for which loans were made in the Saskatchewan Province. The applicants in group 1 consist of individuals with recent histories of delinquent credit payments (noncurrent loans), and applicants in group 2 consist of those individuals without recent delinquent credit payments (current loans) based on the status of the loan as of March 1990. The sample consisted of 754 current loan applications (38%) and 1,245 noncurrent loan applications (62%). The sample data were divided into two subsamples—an in-sample data set and a hold-out sample data set. The in-sample data set was used for model development, and the resulting models were then compared using the in-sample and hold-out sample sets. In this study, 60% (1,199 loans) of the total sample was used for model estimation. The usual procedure in credit scoring studies is to select a large group of explanatory variables and reduce it to a smaller number of statistically significant variables. The above data set was recently used by Turvey in a study in which he compared alternative statistical credit scoring models. Our investigation included only the explanatory variables used in his study to avoid potential overfitting biases. Definitions of the explanatory variables are presented in Table 1. More formal definitions and explanations of the explanatory variables are provided in Turvey and in Turvey and Brown. The HB and MIP models require the parameters hg and mg to be specified. As discussed earlier, in practice, these parameters could be solicited from the user. Since the actual benefits and costs of external and internal deviations from the hyperplane for current and noncurrent loan applicants were not available for this study, a set of arbitrary values was selected for these parameters. Subsequently, four variants of the HB model (denoted by HB-1, HB-2, HB-3, and HB-4) and three variants of the MIP model (denoted by MIP-1, MIP-2, MIP-3) were tested. The objective coefficients associated with variants of the HB and MIP models are given in Table 2. The HB-1 model maximizes the total interior distances from the hyperplane and minimizes the total exterior distances from the hyperplane. The HB-2, HB-3, and HB-4 models maximize the weighted sum of interior distances and minimize the weighted sum of exterior distances from the hyperplane. The objective is to provide a better balance of errors between current and noncurrent loans by varying the objective coefficients assigned to interior and/or exterior deviations. As noted earlier, in contrast to other MP models, the objective function of the MIP model has a direct and meaningful interpretation. For example, the MIP-1 model assumes that the misclassification costs for current and noncurrent loans are the same; hence, the MIP-1 model directly minimizes the number of misclassifications for both groups. The MIP-2 model is similar to the MIP-1, but the weights are proportionally weighted based on sample size in each group. The MIP-3 model, however, assumes that the misclassification cost for a noncurrent loan is twice as much as that for a current loan. Overall, we tested 11 models—two statistical (parametric) and nine nonparametric models. The classification performance results of these models are reported in the next section.
Note: n.a. = not applicable. a
HB = hybrid model; MIP = mixed integer model. Classification Results <top> The classification performance on the in-sample and hold-out samples of the alternative models is presented in Tables 3 and 4. Table 3 presents the classification performance in terms of number of loans in calibration and hold-out samples, while Table 4 presents the same results by percentage. As can be seen from Tables 3 and 4, classification performances for the parametric models are not significantly different from each other. Both the LDM and the FLDM, however, predict better than a pure naive model, i.e., predict better than the proportional prior probabilities for current loans (36.4%) and noncurrent loans (63.6%). But the MSD and OSD models perform significantly worse than the LDM, the FLDM, and the naive model for both calibration and hold-out samples. Among the four HB models tested, the classification performance of the HB-4 model in the hold-out sample is worse than that of the other three HB models. The results reported in Tables 3 and 4 suggest that the HB-2, HB-3, and HB-4 models perform as well as statistical models in hold-out samples. However, all three MIP models perform marginally better than the LDM and the FLDM for calibration and hold-out samples. The LDM correctly classified 601 noncurrent loans in the calibration sample, for a correct classification rate of 65%. The overall correct classification rate for the LDM is 66% for the calibration sample (Table 4). The MIP-1, MIP-2, and MIP-3 models correctly classified 622, 609, and 609 of the noncurrent loans in the calibration sample, for a correct classification rate of 66%, 68%, and 68%, respectively. The overall correct classification rates for the MIP-1, MIP-2, and MIP-3 are 67%, 67%, and 68% respectively, for the calibration sample, which are marginally better than the corresponding rates for the LDM and FLDM. Nevertheless, the results presented in Tables 3 and 4 show that both the LDM and FLDM provide a more balanced discriminant solution than the MP models. None of the MP models tested here show a higher correct classification rate for current loans than the LDM and FLDM. Conclusions <top> The purpose of this study was to compare the alternative statistical and MP credit scoring models in an empirical setting using actual credit data. The results indicate that there were only small differences in the classifying accuracy of statistical and MP approaches. The results of this investigation reenforce the findings of the experimental studies which claim that MP models are as competitive as statistical DA models. As shown here, the MIP models even outperform the statistical models. The principle disadvantage of the MP approach is that MP produces estimates without statistical properties (e.g., standard errors, t-ratios, etc.); thus, no hypothesis testing can be performed. However, if the objective is to estimate a discriminant function that provides the least classification errors, this weakness should not dilute the potential of MP models as presented here. We recommend the use of MP models in an applied environment when the incorporation of a side condition becomes necessary, when only a small sample size is available, or when the data set is heavily contaminated. In these situations, the MP models have the potential to perform better than the statistical DA models. Since there is no optimal DA model which fits all data sets in all situations, it may be a good practice to apply the data to alternative parametric and nonparametric DA models and then choose the best model. In many credit scoring applications, even a moderate improvement in the ability to correctly classify may represent a significant increase in financial contributions due to a decrease in losses from making bad loans.
a LDM = logit discriminant model, FLDM = Fisher linear discriminant model, MSD = minimize sum of the deviations, OSD = optimize sum of distances, HB = hybrid model, and MIP = mixed integer model. b Current and noncurrent loans.
a LDM = logit discriminant model, FLDM = Fisher linear discriminant model, MSD = minimize sum of the deviations, OSD = optimize sum of distances, HB = hybrid model, and MIP = mixed integer model. b Current and noncurrent loans. References <top>
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
AEM Home © 2002
Cornell University |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||