Psych Stats

## Purpose

• Descriptive statistics: describe data sets or extract

characteristics of the group being observed

• Inferential statistics: used to make conclusions (predictions)

about large numbers of individuals when only a small sample
from the larger population is observed.

## Deviation

• Measure of variability = an index of diversity in the data distribution
• Deviation = distance from a data point to the mean
• Average deviation is always going to be zero.
• Squaring the deviations is as the standard approach to eliminate negative numbers in Statistics.
• SS: The sum of squared errors is a good measure of the accuracy of the model.
• Average error = SS divided by the number of observations (N)
• Variance = the average error between the mean and the observations made (and so is a measure of how well the model fits the actual data). Mean Squared Deviation.
• Standard Deviation = squared root of mean squared deviation (sqrt variance)

## Measures of Central Tendency

• Mean = arithmetic mean ( i.e., the average)
• Mode: the most frequent score in a distribution
• Mode is unaffected by extreme scores.
• Modes are generally useful only in unimodal distributions
• Generally best used for qualitative data (i.e. Nominal scale) - e.g. not numbers
• Median: the middle value of the ordered distribution scores
• The median is unaffected by extreme scores.
• Measure of Central Tendency = a measure that is typical of the set of the data
• Data = a measurement collected on a variable as a consequence of

observation.

• Case = a single unit of observation
• Variable = a certain characteristic of a population that can take

different values.

## Population v Sample

• Population = entire group of people
• Parameters = term for summary properties or measures about

population values

• Sample = small subset of the population
• Statistics = term for summary properties or measures of

sample values

• Arithmetic mean = X with bar on top, or M

## Z-Scores

• z-score or Standard Score is the deviation of the i-th case divided by the standard deviation
• The z scores are a precise index of how an individual compares with the rest of the group in terms of the distance from the mean.
• They also provide a metric for comparing performance/measurements on completely unrelated scales (Bradman vs Einstein)
• Transforming ANY DISTRIBUTION of raw scores into Z-scores results in a distribution with a MEAN of 0 and a STANDARD DEVIATION of 1
• A negative z-score means that the original score was below the mean. A positive z score means that the original score was above the mean.
• The area between any two z-score values represents the number, proportion or percent of the scores that fall between those two values

## Central Limit Theorem

• The Central Limit Theorem: It turns out that this distribution of sample means is always normal
• (even if the population distribution is not normal).
• Mean of all the samples would equal the population mean
• The sampling distribution = frequency distribution of sample means
• Standard Error of the Mean - SEM (or σM) = standard deviation of the sampling distribution

## Confidence Interval

• Confidence interval - an estimated range of values which is likely to

include an unknown population parameter
* boundaries within which we believe the true value of the mean will fall.
Determining 99% confidence interval for population mean

## Null Hypothesis

• Null hypothesis H0 - refers to some null or conservative state of affairs; the assumption you would make without evidence to the contrary.
• H0 contains the argument “that the observed results occurred by chance due to fluctuations of sampling”.
• Alternative hypothesis H1 - the complement of the null hypothesis. It is not tested

directly but adopted upon a rejection of the null hypothesis. It usually expresses the experimenter’s belief about the parameter being studied
Testing Null Hypothesis:

• t = standardised difference between two means
• Significance level is set at α = 0.05 thus critical t= 1.96
• If |t(mean1-mean1)| > critical t, p < 0.05

## Symmetric v Skewed

• Unimodal = one mode = one entry that has the most hits
• Negatively skewed distribution: the left tail is longer, observations are clustered towards higher end of the scale
• Positively skewed distribution: the right tail is longer, observations are clustered towards lower end of the scale

## Percentiles

• Percentile rank = the proportion of scores in a distribution that a specific score is greater than or equal to. = (CF/N) * 100
• The percentile rank show how an individual score compares to the others scores in the sample.
• Percentiles are limited because the scores are merely ordered.
• The distance between the scores is not specified.
• Percentile Score: is the score corresponding to a particular percentile rank.

## Frequency

• Cumulative frequency - the counts accumulated by the current count and all previous ones, for all scores lower than the score of interest in the interval of interest.
• I.e. what i is the currently score

• Qualitative variables = Attributes of the variable fall into discrete categories; (e.g. gender, favorite color, country of birth)
• Quantitative variables = Attributes of the variable are assigned values that can be anywhere within a range; (e.g. age, weight, height, IQ,speed of driving)

## Measurement Scales

• Nominal scale = identity
• Used for cateogorical/discrete data
• Any case can be placed in one and only one category.
• Numbers used as labels; arbitrary

• Ordinal scale = identity + order
• Used where scores can be ranked / ordered;
• There is no objective distance between any two points on your subjective scale.

• Interval scale = identity + order + equidistance
• Measurement at this level allows us to separate objects or events into mutually exclusive categories, arranged in a specific order, and specify the distance between data points
• On this scale numbers are separated by equal-sized intervals but have no meaningful or absolute zero.
• Doesn't do rations - e.g. IQ 140 is not twice as high as IQ 70.

• Ratio scale = identity + order + equidistance + origin
• separate objects or events into mutually exclusive categories
• arranged in a specific order
• specify the distance between data points
• compare ratios constructed from the data.

## Graphing

• Horizontal Axis
• also called the abscissa, or X axis
• the values of the variable
• Vertical Axis:
• also called the ordinate, or Y axis
• the frequencies, or proportions or percentages

### Class Intervals

When choosing a class interval width one aims to produce a concise picture of the data, with minimal loss of information. Generally, use 6 – 12 intervals of equal width

## Correlation

• The Pearson-product moment correlation coefficient r is sensitive only to linear relationships.
• Correlation != Causation. To test causation: Experimental designs are best. Systematically manipulated X and measure Y.
• If we know the correlation between two events (e.g 0.61) and we have a z-score for X, we can work out z-score for Y to be the product of the two known facts.

page revision: 7, last edited: 25 Oct 2012 08:03