Introduction to Descriptive and Inferential Statistics
Descriptive statistics summarize and organize characteristics of a data set, providing a detailed overview through the use of statistics, such as mean and standard deviation. Inferential statistics, on the other hand, use a random sample of data taken from a population to describe and make inferences about the population. Inferential statistics are valuable in hypothesis testing, determining relationships between variables, and making predictions.
Types of Data
Nominal: Categorical data without an inherent order.
Ordinal: Categorical data with a defined order but not evenly spaced.
Interval: Numerical data with equal intervals but no true zero.
Ratio: Numerical data with equal intervals and a true zero.
Graphical Representations
Visualize data to identify patterns, trends, and outliers.
- Bar Chart: Represents categorical data with rectangular bars.
- Histogram: Represents the distribution of numerical data.
- Box Plot: Visual representation of the five-number summary (Minimum, Q1, Median, Q3, Maximum).
- Scatter Plot: Shows the relationship between two quantitative variables.
Measures of Central Tendency
Provide a central value for the data set.
Mean (Average): \(\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\)
Median: Middle value when data is ordered.
Mode: Most frequently occurring value.
Measures of Dispersion
Indicate the spread or variability of a data set.
Range: Difference between the highest and lowest values.
Variance: Average of the squared differences from the Mean.
\(\sigma^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n}\) for a population,
\(s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}\) for a sample.
Standard Deviation: Square root of the variance. \(\sigma\) for population, \(s\) for sample.
Interquartile Range (IQR): Difference between the 75th percentile (Q3) and the 25th percentile (Q1).
Z-Scores
Measure of how many standard deviations an element is from the mean.
\(z = \frac{x - \bar{x}}{\sigma}\) where \(x\) is a score from the population, \(\bar{x}\) is the mean of the population, and \(\sigma\) is the standard deviation of the population.
Skewness and Kurtosis
Skewness: Measure of the asymmetry of the probability distribution.
Kurtosis: Measure of the 'tailedness' of the probability distribution.
Frequency Distributions
Frequency distributions are a way to show how often each value in a set of data occurs. They can be depicted using tables or graphs, such as histograms or pie charts, providing a visual representation of data variability and concentration.
Cumulative Frequency
Cumulative frequency is a running total of frequencies through the classes of a frequency distribution. It can be used to determine the number of observations below a particular value in a dataset.
Percentiles and Quartiles
Percentiles and quartiles are measures that divide a set of observations into 100 equal parts and 4 equal parts, respectively. They are useful for understanding the distribution and dispersion of data, helping to highlight where a particular data point stands in comparison to others.
Outliers
Outliers are data points that differ significantly from other observations. They can be indicative of variability in measurement, experimental errors, or novelty in data. Identifying outliers is crucial for accurate statistical analysis.