Scatter Charts and Regression

Since Version 1.2.1 of flashChart new types of charts is supported, scatter charts. A scatter chart or scattergraph is a type of mathematical diagram using Cartesian coordinates to display values for two variables for a set of data.

The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis. This kind of plot is also called a scatter plot, scattergram, scatter diagram or scatter graph.

Overview

A scatter plot can suggest various kinds of correlations between variables with a certain confidence interval. for example, weight and height, weight would be on x axis and height would be on the y axis. Correlations may be positive (rising), negative (falling), or null (uncorrelated). If the pattern of dots slopes from lower left to upper right, it suggests a positive correlation between the variables being studied. If the pattern of dots slopes from upper left to lower right, it suggests a negative correlation. A line of best fit (alternatively called 'trendline') can be drawn in order to study the correlation between the variables. An equation for the correlation between the variables can be determined by established best-fit procedures. For a linear correlation, the best-fit procedure is known as linear regression and is guaranteed to generate a correct solution in a finite time. No universal best-fit procedure is guaranteed to generate a correct solution for arbitrary relationships. A scatter plot is also very useful when we wish to see how two comparable data sets agree with each other. In this case, an identity line, i.e., a y=x line, or an 1:1 line, is often drawn as a reference. The more the two data sets agree, the more the scatters tend to concentrate in the vicinity of the identity line; if the two data sets are numerically identical, the scatters fall on the identity line exactly.

This scatter chart illustrates the general nature of the correlation between arm span and height - clicking (hiding) "Trend" this may show up not to be obvious).

But reading from left to right on the horizontal scale, you can observe that narrow arm spans tend to be associated with people who are shorter, and wider arm spans tend to be associated with people who are taller -- that is, there appears to be an overall positive correlation between arm span and height.

One of the most powerful aspects of a scatter chart, however, is its ability to show nonlinear relationships between variables. Furthermore, if the data is represented by a mixture model of simple relationships, these relationships will be visually evident as superimposed patterns.

Regression

Regression

In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed.

Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. Please read more about regression analysis and related topics at Wikipedia.

If you request a scatter chart with flashChart you may also request for a basic regression analysis. flashChart provides three builtin calculations (all based on the linear regression model) which are described by following functions:

Please select choice in chart menu
  1. f(x) = ax + b (linear regression)
  2. f(x) = abx (exponential regression)
  3. f(x) = axb (exponential regression)
With flashChart's parameter show_regression_line set to "1" you will request a regression calculation and a regression line to be created and displayed on your scattter chart.
Additionally you may have with flashChart's parameter show_regression_formula the calculated formula to be displayed as legend (key). This legend will have assigned click-attribute key_onclick="toggle-visibility" (if you click it the regression line will be hidden or displayed).

Together with the regression formula flashChart will also display the calculated coefficient of determination as "R". The coefficient of determination is the proportion of variability in your data that is accounted for by the calculated formula (flashChart's statistical model) and it will range from "0" (no correlation) to "1" (perfect correlation).
"R" (correctly it should read "R2") is a statistic that will give some information about the goodness of fit of a model. In regression, the coefficient of determination is a statistical measure of how well, the regression line approximates the real data points. An "R" of 1.0 indicates that the regression line perfectly fits the data.

Own Regression formula

Own Regression formula

Flashchart's supplied regression schemes probably may not fit in any cases your data, therefore Flashchart allows to specify your own regression formula. The sample below shows a scenario where an own statistical model (a formula) is used to better reflect "measured reality". To use your own regression model (formula) you will have to specify parameter formula together with show_regression_line. You will find more information how to specify your own formula in flashCharts's documentation "Tutorial and Reference". More difficult may be finding the proper statistical model that fits your data. In the sample below I've setup a (fictious) measurement which will be described by the "known" model "f(x) = -0.8x3 + 100x2". If you do not know the (expected) behaviour of dependent/independent variables versus your measured data, flashChart may help with a first analysis.

Please click choice in chart menu

The law of diminishing returns (also law of diminishing marginal returns or law of increasing relative cost) states that in all productive processes, adding more of one factor of production, while holding all others constant, will at some point yield lower per-unit returns. The law of diminishing returns does not imply that adding more of a factor will decrease the total production, a condition known as negative returns, though in fact this is common.

For example, the use of fertilizer improves crop production on farms and in gardens; but at some point, adding more and more fertilizer improves the yield less per unit of fertilizer, and excessive quantities can even reduce the yield. A common sort of example is adding more workers to a job, such as assembling a car on a factory floor. At some point, adding more workers causes problems such as getting in each other's way, or workers frequently find themselves waiting for access to a part. In all of these processes, producing one more unit of output per unit of time will eventually cost increasingly more, due to inputs being used less and less effectively.