Bivariate Statistics (11/15/14)
- Cross-tabs and Chi Square. Crosstabulations show the frequencies in a table between two variables. Chi Square is a test of significance--calculates the probability that the null hypothesis is true--if the Chi Square statistic is less than 0.05 (5%), the relationship is significant. Otherwise, it is not significant. Used only for discrete variables (nominal and ordinal) with a relatively small number of categories. The basis of Chi Square is the comparision of the null hypothesis to the original hypothesis.
- Lambda is a measure of association (measures the strength of a relationship). Only appropriate in comparing two relationships (though a test of significance does exist). Whichever has the highest value for Lambda is the stronger association. For discrete variables only. Range is 0 to 1. The essential element of Lambda is comparing the number of errors made in predicting the dependent variable without the dependent variable to the number of errors in predicting the dependent variable with the dependent variable.
- Gamma is a measure of association for ordinal variables. Only appropriate in comparing the value of Gamma for two relationships (though a test of significance does exist). Range is -1 to 1. Ordinal variables have a sense of direction, so the negative values will indicate inverse relationships. Remember inverse here only means that the two variables move in opposite directions (one higher and the other lower--or vice versa). We are essentially comparing pairs of units to see if they are moving in the same direction, opposite directions, or "tied" at some level.
- Correlation Coefficient is a measure of association for interval variables. The range is -1 to 1. The negative values simply mean an inverse relationship. The calculation of the correlation coefficient is built around shifting the origin to the mean values of X and Y and standardizing the new values of the product of the mean of X with the mean of Y.
- Regression produces the "best fit" linear equation (Y = a + bX) describing the impact of X on Y. The calculation of the regression coefficient (the slope or b in the prior equation) is a function of iterations that produce the line that minimizes the distances from all values to the line. The coefficient represents how much of a change in Y you can expect from a change in X (which is the slope).