Psych Stats

## Purpose

- Descriptive statistics: describe data sets or extract

characteristics of the group being observed

- Inferential statistics: used to make conclusions (predictions)

about large numbers of individuals when only a small sample

from the larger population is observed.

## Deviation

- Measure of variability = an index of diversity in the data distribution
- Deviation = distance from a data point to the mean
- Average deviation is always going to be zero.
- Squaring the deviations is as the standard approach to eliminate negative numbers in Statistics.
- SS: The sum of squared errors is a good measure of the accuracy of the model.

- Average error = SS divided by the number of observations (N)
- Variance = the average error between the mean and the observations made (and so is a measure of how well the model fits the actual data). Mean Squared Deviation.
- Standard Deviation = squared root of mean squared deviation (sqrt variance)

## Measures of Central Tendency

- Mean = arithmetic mean ( i.e., the average)
- Mode: the most frequent score in a distribution
- Mode is unaffected by extreme scores.
- Modes are generally useful only in unimodal distributions
- Generally best used for qualitative data (i.e. Nominal scale) - e.g. not numbers

- Median: the middle value of the ordered distribution scores
- The median is unaffected by extreme scores.

- Measure of Central Tendency = a measure that is typical of the set of the data

- Data = a measurement collected on a variable as a consequence of

observation.

- Case = a single unit of observation
- Variable = a certain characteristic of a population that can take

different values.

## Population v Sample

- Population = entire group of people
- Parameters = term for summary properties or measures about

population values

- Sample = small subset of the population
- Statistics = term for summary properties or measures of

sample values

- Arithmetic mean = X with bar on top, or M

## Z-Scores

- z-score or Standard Score is the deviation of the i-th case divided by the standard deviation
- The z scores are a precise index of how an individual compares with the rest of the group in terms of the distance from the mean.
- They also provide a metric for comparing performance/measurements on completely unrelated scales (Bradman vs Einstein)
- Transforming ANY DISTRIBUTION of raw scores into Z-scores results in a distribution with a MEAN of 0 and a STANDARD DEVIATION of 1
- A negative z-score means that the original score was below the mean. A positive z score means that the original score was above the mean.
- The area between any two z-score values represents the number, proportion or percent of the scores that fall between those two values

## Central Limit Theorem

- The Central Limit Theorem: It turns out that this distribution of sample means is always normal
- (even if the population distribution is not normal).
- Mean of all the samples would equal the population mean

- The sampling distribution = frequency distribution of sample means
- Standard Error of the Mean - SEM (or σM) = standard deviation of the sampling distribution

## Confidence Interval

- Confidence interval - an estimated range of values which is likely to

include an unknown population parameter

* boundaries within which we believe the true value of the mean will fall.

Determining 99% confidence interval for population mean

## Null Hypothesis

- Null hypothesis H0 - refers to some null or conservative state of affairs; the assumption you would make without evidence to the contrary.
- H0 contains the argument “that the observed results occurred by chance due to fluctuations of sampling”.
- Alternative hypothesis H1 - the complement of the null hypothesis. It is not tested

directly but adopted upon a rejection of the null hypothesis. It usually expresses the experimenter’s belief about the parameter being studied

Testing Null Hypothesis:

*t*= standardised difference between two means- Significance level is set at α = 0.05 thus critical t= 1.96
- If |t(mean1-mean1)| > critical t, p < 0.05

## Symmetric v Skewed

- Unimodal = one mode = one entry that has the most hits
- Negatively skewed distribution: the left tail is longer, observations are clustered towards higher end of the scale
- Positively skewed distribution: the right tail is longer, observations are clustered towards lower end of the scale

## Percentiles

- Percentile rank = the proportion of scores in a distribution that a specific score is greater than or equal to. = (CF/N) * 100
- The percentile rank show how an individual score compares to the others scores in the sample.
- Percentiles are limited because the scores are merely ordered.
- The distance between the scores is not specified.

- Percentile Score: is the score corresponding to a particular percentile rank.

## Frequency

- Cumulative frequency - the counts accumulated by the current count and all previous ones, for all scores lower than the score of interest in the interval of interest.
- I.e. what i is the currently score

- Qualitative variables = Attributes of the variable fall into discrete categories; (e.g. gender, favorite color, country of birth)
- Quantitative variables = Attributes of the variable are assigned values that can be anywhere within a range; (e.g. age, weight, height, IQ,speed of driving)

## Measurement Scales

- Nominal scale = identity
- Used for cateogorical/discrete data
- Any case can be placed in one and only one category.
- Numbers used as labels; arbitrary

- Ordinal scale = identity + order
- Used where scores can be ranked / ordered;
- There is no objective distance between any two points on your subjective scale.

- Interval scale = identity + order + equidistance
- Measurement at this level allows us to separate objects or events into mutually exclusive categories, arranged in a specific order, and specify the distance between data points
- On this scale numbers are separated by equal-sized intervals but have no meaningful or absolute zero.
- Doesn't do rations - e.g. IQ 140 is not twice as high as IQ 70.

- Ratio scale = identity + order + equidistance + origin
- separate objects or events into mutually exclusive categories
- arranged in a specific order
- specify the distance between data points
- compare ratios constructed from the data.

## Graphing

- Horizontal Axis
- also called the abscissa, or X axis
- the values of the variable

- Vertical Axis:
- also called the ordinate, or Y axis
- the frequencies, or proportions or percentages

### Class Intervals

When choosing a class interval width one aims to produce a concise picture of the data, with minimal loss of information. Generally, use 6 – 12 intervals of equal width

## Correlation

- The Pearson-product moment correlation coefficient
*r*is sensitive only to linear relationships. - Correlation != Causation. To test causation: Experimental designs are best. Systematically manipulated X and measure Y.
- If we know the correlation between two events (e.g 0.61) and we have a z-score for X, we can work out z-score for Y to be the product of the two known facts.

page revision: 7, last edited: 25 Oct 2012 08:03