-
Binarium
Các nhà môi giới tùy chọn nhị phân tốt nhất! Đào tạo miễn phí và tài khoản demo!
Đăng ký tiền thưởng! -
Linear regression
Generate predictions using an easily interpreted mathematical formula
What is linear regression?
Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable’s value is called the independent variable.
This form of analysis estimates the coefficients of the linear equation, involving one or more independent variables that best predict the value of the dependent variable. Linear regression fits a straight line or surface that minimizes the discrepancies between predicted and actual output values. There are simple linear regression calculators that use a “least squares” method to discover the best-fit line for a set of paired data. You then estimate the value of X (dependent variable) from Y (independent variable).
Generate predictions more easily
You can perform linear regression in Microsoft Excel or use statistical software packages such as IBM SPSS® Statistics that greatly simplify the process of using linear-regression equations, linear-regression models and linear-regression formula. SPSS Statistics can be leveraged in techniques such as simple linear regression and multiple linear regression.
You can perform the linear regression method in a variety of programs and environments, including:
- R linear regression
- MATLAB linear regression
- Sklearn linear regression
- Linear regression Python
- Excel linear regression
Why linear regression is important
Linear-regression models are relatively simple and provide an easy-to-interpret mathematical formula that can generate predictions. Linear regression can be applied to various areas in business and academic study.
You’ll find that linear regression is used in everything from biological, behavioral, environmental and social sciences to business. Linear-regression models have become a proven way to scientifically and reliably predict the future. Because linear regression is a long-established statistical procedure, the properties of linear-regression models are well understood and can be trained very quickly.
A proven way to scientifically and reliably predict the future
Business and organizational leaders can make better decisions by using linear regression techniques. Organizations collect masses of data, and linear regression helps them use that data to better manage reality — instead of relying on experience and intuition. You can take large amounts of raw data and transform it into actionable information.
-
Binarium
Các nhà môi giới tùy chọn nhị phân tốt nhất! Đào tạo miễn phí và tài khoản demo!
Đăng ký tiền thưởng! -
You can also use linear regression to provide better insights by uncovering patterns and relationships that your business colleagues might have previously seen and thought they already understood. For example, performing an analysis of sales and purchase data can help you uncover specific purchasing patterns on particular days or at certain times. Insights gathered from regression analysis can help business leaders anticipate times when their company’s products will be in high demand.
Key assumptions of effective linear regression
Assumptions to be considered for success with linear-regression analysis:
- For each variable: Consider the number of valid cases, mean and standard deviation.
- For each model: Consider regression coefficients, correlation matrix, part and partial correlations, multiple R, R2, adjusted R2, change in R2, standard error of the estimate, analysis-of-variance table, predicted values and residuals. Also, consider 95-percent-confidence intervals for each regression coefficient, variance-covariance matrix, variance inflation factor, tolerance, Durbin-Watson test, distance measures (Mahalanobis, Cook and leverage values), DfBeta, DfFit, prediction intervals and case-wise diagnostic information.
- Plots: Consider scatterplots, partial plots, histograms and normal probability plots.
- Data: Dependent and independent variables should be quantitative. Categorical variables, such as religion, major field of study or region of residence, need to be recoded to binary (dummy) variables or other types of contrast variables.
- Other assumptions: For each value of the independent variable, the distribution of the dependent variable must be normal. The variance of the distribution of the dependent variable should be constant for all values of the independent variable. The relationship between the dependent variable and each independent variable should be linear and all observations should be independent.
Curve Fitting: Linear Regression
Regression is all about fitting a low order parametric model or curve to data, so we can reason about it or make predictions on points not covered by the data. Both data and model are known, but we’d like to find the model parameters that make the model fit best or good enough to the data according to some metric.
We may also be interested in how well the model supports the data or whether we better look for another more appropriate model.
In a regression, a lot of data is reduced and generalized into a few parameters. The resulting model can obviously no longer reproduce all the original data exactly – if you need the data to be reproduced exactly, have a look at interpolation instead.
Simple Regression: Fit to a Line
In the simplest yet still common form of regression we would like to fit a line \(y : x \mapsto a + b x\) to a set of points \((x_j,y_j)\) , where \(x_j\) and \(y_j\) are scalars. Assuming we have two double arrays for x and y, we can use Fit.Line to evaluate the \(a\) and \(b\) parameters of the least squares fit:
How well do these parameters fit the data? The data points happen to be positioned exactly on a line. Indeed, the coefficient of determination confirms the perfect fit:
Linear Model
In practice, a line is often not an adequate model. But if we can choose a model that is linear, we can leverage the power of linear algebra; otherwise we have to resort to iterative methods (see Nonlinear Optimization).
A linear model can be described as linear combination of \(N\) arbitrary but known functions \(f_i(x)\) , scaled by the model parameters \(p_i\) . Note that none of the functions \(f_i\) depends on any of the \(p_i\) parameters.
\[y : x \mapsto p_1 f_1(x) + p_2 f_2(x) + \cdots + p_N f_N(x)\]
If we have \(M\) data points \((x_j,y_j)\) , then we can write the regression problem as an overdefined system of \(M\) equations:
\[\begin
Or in matrix notation with the predictor matrix \(X\) and the response \(y\) :
\[\begin
Provided the dataset is small enough, if transformed to the normal equation \(\mathbf
Using normal equations is comparably fast as it can dramatically reduce the linear algebra problem to be solved, but that comes at the cost of less precision. If you need more precision, try using MultipleRegression.QR or MultipleRegression.Svd instead, with the same arguments.
Polynomial Regression
To fit to a polynomial we can choose the following linear model with \(f_i(x) := x^i\) :
\[y : x \mapsto p_0 + p_1 x + p_2 x^2 + \cdots + p_N x^N\]
The predictor matrix of this model is the Vandermonde matrix. There is a special function in the Fit class for regressions to a polynomial, but note that regression to high order polynomials is numerically problematic.
Multiple Regression
The \(x\) in the linear model can also be a vector \(\mathbf x = [x^<(1)>\; x^ <(2)>\cdots x^<(k)>]\) and the arbitrary functions \(f_i(\mathbf x)\) can accept vectors instead of scalars.
If we use \(f_i(\mathbf x) := x^<(i)>\) and add an intercept term \(f_0(\mathbf x) := 1\) we end up at the simplest form of ordinary multiple regression:
\[y : x \mapsto p_0 + p_1 x^ <(1)>+ p_2 x^ <(2)>+ \cdots + p_N x^<(N)>\]
For example, for the data points \((\mathbf
The Fit.MultiDim routine uses normal equations, but you can always choose to explicitly use e.g. the QR decomposition for more precision by using the MultipleRegression class directly:
Arbitrary Linear Combination
In multiple regression, the functions \(f_i(\mathbf x)\) can also operate on the whole vector or mix its components arbitrarily and apply any functions on them, provided they are defined at all the data points. For example, let’s have a look at the following complicated but still linear model in two dimensions:
\[z : (x, y) \mapsto p_0 + p_1 \mathrm
Since we map (x,y) to (z) we need to organize the tuples in two arrays:
Then we can call Fit.LinearMultiDim with our model, which will return an array with the best fitting 4 parameters \(p_0, p_1, p_2, p_3\) :
Evaluating the model at specific data points
Let’s say we have the following model:
\[y : x \mapsto a + b \ln x\]
For this case we can use the Fit.LinearCombination function:
In order to evaluate the resulting model at specific data points we can manually apply the values of p to the model function, or we can use an alternative function with the Func suffix that returns a function instead of the model parameters. The returned function can then be used to evaluate the parametrized model:
Linearizing non-linear models by transformation
Sometimes it is possible to transform a non-linear model into a linear one. For example, the following power function
\[z : (x, y) \mapsto u x^v y^w\]
can be transformed into the following linear model with \(\hat
\[\hat
Weighted Regression
Sometimes the regression error can be reduced by dampening specific data points. We can achieve this by introducing a weight matrix \(W\) into the normal equations \(\mathbf
\) . Such weight matrices are often diagonal, with a separate weight for each data point on the diagonal.
-
Binarium
Các nhà môi giới tùy chọn nhị phân tốt nhất! Đào tạo miễn phí và tài khoản demo!
Đăng ký tiền thưởng! -