Psych Stats


  • Descriptive statistics: describe data sets or extract

characteristics of the group being observed

  • Inferential statistics: used to make conclusions (predictions)

about large numbers of individuals when only a small sample
from the larger population is observed.


  • Measure of variability = an index of diversity in the data distribution
  • Deviation = distance from a data point to the mean
    • Average deviation is always going to be zero.
    • Squaring the deviations is as the standard approach to eliminate negative numbers in Statistics.
    • SS: The sum of squared errors is a good measure of the accuracy of the model.
  • Average error = SS divided by the number of observations (N)
  • Variance = the average error between the mean and the observations made (and so is a measure of how well the model fits the actual data). Mean Squared Deviation.
  • Standard Deviation = squared root of mean squared deviation (sqrt variance)

Measures of Central Tendency

  • Mean = arithmetic mean ( i.e., the average)
  • Mode: the most frequent score in a distribution
    • Mode is unaffected by extreme scores.
    • Modes are generally useful only in unimodal distributions
    • Generally best used for qualitative data (i.e. Nominal scale) - e.g. not numbers
  • Median: the middle value of the ordered distribution scores
    • The median is unaffected by extreme scores.
  • Measure of Central Tendency = a measure that is typical of the set of the data
  • Data = a measurement collected on a variable as a consequence of


  • Case = a single unit of observation
  • Variable = a certain characteristic of a population that can take

different values.

Population v Sample

  • Population = entire group of people
    • Parameters = term for summary properties or measures about

population values

  • Sample = small subset of the population
    • Statistics = term for summary properties or measures of

sample values

  • Arithmetic mean = X with bar on top, or M



  • z-score or Standard Score is the deviation of the i-th case divided by the standard deviation
    • The z scores are a precise index of how an individual compares with the rest of the group in terms of the distance from the mean.
    • They also provide a metric for comparing performance/measurements on completely unrelated scales (Bradman vs Einstein)
    • Transforming ANY DISTRIBUTION of raw scores into Z-scores results in a distribution with a MEAN of 0 and a STANDARD DEVIATION of 1
    • A negative z-score means that the original score was below the mean. A positive z score means that the original score was above the mean.
    • The area between any two z-score values represents the number, proportion or percent of the scores that fall between those two values


Central Limit Theorem

  • The Central Limit Theorem: It turns out that this distribution of sample means is always normal
    • (even if the population distribution is not normal).
    • Mean of all the samples would equal the population mean
  • The sampling distribution = frequency distribution of sample means
  • Standard Error of the Mean - SEM (or σM) = standard deviation of the sampling distribution


Confidence Interval

  • Confidence interval - an estimated range of values which is likely to

include an unknown population parameter
* boundaries within which we believe the true value of the mean will fall.
Determining 99% confidence interval for population mean


Null Hypothesis

  • Null hypothesis H0 - refers to some null or conservative state of affairs; the assumption you would make without evidence to the contrary.
  • H0 contains the argument “that the observed results occurred by chance due to fluctuations of sampling”.
  • Alternative hypothesis H1 - the complement of the null hypothesis. It is not tested

directly but adopted upon a rejection of the null hypothesis. It usually expresses the experimenter’s belief about the parameter being studied
Testing Null Hypothesis:


  • t = standardised difference between two means
  • Significance level is set at α = 0.05 thus critical t= 1.96
  • If |t(mean1-mean1)| > critical t, p < 0.05

Symmetric v Skewed

  • Unimodal = one mode = one entry that has the most hits
  • Negatively skewed distribution: the left tail is longer, observations are clustered towards higher end of the scale
  • Positively skewed distribution: the right tail is longer, observations are clustered towards lower end of the scale



  • Percentile rank = the proportion of scores in a distribution that a specific score is greater than or equal to. = (CF/N) * 100
    • The percentile rank show how an individual score compares to the others scores in the sample.
    • Percentiles are limited because the scores are merely ordered.
    • The distance between the scores is not specified.
  • Percentile Score: is the score corresponding to a particular percentile rank.


  • Cumulative frequency - the counts accumulated by the current count and all previous ones, for all scores lower than the score of interest in the interval of interest.
    • I.e. what i is the currently score


  • Qualitative variables = Attributes of the variable fall into discrete categories; (e.g. gender, favorite color, country of birth)
  • Quantitative variables = Attributes of the variable are assigned values that can be anywhere within a range; (e.g. age, weight, height, IQ,speed of driving)

Measurement Scales

  • Nominal scale = identity
    • Used for cateogorical/discrete data
    • Any case can be placed in one and only one category.
    • Numbers used as labels; arbitrary


  • Ordinal scale = identity + order
    • Used where scores can be ranked / ordered;
    • There is no objective distance between any two points on your subjective scale.


  • Interval scale = identity + order + equidistance
    • Measurement at this level allows us to separate objects or events into mutually exclusive categories, arranged in a specific order, and specify the distance between data points
    • On this scale numbers are separated by equal-sized intervals but have no meaningful or absolute zero.
    • Doesn't do rations - e.g. IQ 140 is not twice as high as IQ 70.


  • Ratio scale = identity + order + equidistance + origin
    • separate objects or events into mutually exclusive categories
    • arranged in a specific order
    • specify the distance between data points
    • compare ratios constructed from the data.



  • Horizontal Axis
    • also called the abscissa, or X axis
    • the values of the variable
  • Vertical Axis:
    • also called the ordinate, or Y axis
    • the frequencies, or proportions or percentages

Class Intervals

When choosing a class interval width one aims to produce a concise picture of the data, with minimal loss of information. Generally, use 6 – 12 intervals of equal width




  • The Pearson-product moment correlation coefficient r is sensitive only to linear relationships.
  • Correlation != Causation. To test causation: Experimental designs are best. Systematically manipulated X and measure Y.
  • If we know the correlation between two events (e.g 0.61) and we have a z-score for X, we can work out z-score for Y to be the product of the two known facts.