User:Dcljr/Statistics

This page contains terms related to probability and statistics. I've just copied it over from a subpage of my user page at Wikipedia and am in the process of paring it down to bare lists.

General terms

 * statistics, statistic, statistical
 * mathematics, mathematical
 * data
 * discipline
 * academic
 * field
 * physical science
 * social science, social sciences
 * observation
 * descriptive statistics
 * census
 * population, populations
 * sample, samples, sampling
 * inferential statistics
 * random, randomness
 * uncertainty
 * probability, probabilities
 * hypothesis testing, hypothesis test, hypothesis tests, test of hypotheses, tests of hypotheses, testing hypotheses
 * significance testing, significance test, significance tests, test of significance, tests of significance, testing for significance
 * estimation, point estimation, interval estimation
 * estimate, point estimate, interval estimate
 * estimates, point estimates, interval estimates
 * prediction
 * association, correlation, relationship
 * associated, correlated, related
 * regression, linear regression, simple linear regression, multiple linear regression, least squares regression, least-squares regression
 * regress, regressing, regressed, regresses
 * inference, inferences, statistical inference, statistical inferences
 * applied statistics
 * statistical theory, theory of statistics, theoretical statistics, mathematical statistics
 * applied mathematics
 * probability theory, theory of probability, theoretical probability, mathematical probability
 * mathematical analysis
 * Bureau of Labor Statistics, United States Census Bureau, Statistical Abstract of the United States

Etymology

 * Latin phrase statisticum collegium (lecture about state affairs) — statisticum collegium
 * Italian word statista (statesman or politician
 * German Statistik (originally the analysis of data about the state)
 * statistical service

Definitions
Some textbook definitions of statistics and related terms (italics added):


 * Stephen Bernstein and Ruth Bernstein, Schaum's Outline of Elements of Statistics II: Inferential Statistics (1999)
 * Statistics is the science that deals with the collection, analysis, and interpretation of numerical information.


 * In descriptive statistics, techniques are provided for collecting, organizing, summarizing, describing, and representing numerical information.


 * [Inferential statistics provides] techniques.... for making generalizations and decisions about the entire population from limited and uncertain sample information.


 * Donald A. Berry, Statistics: A Bayesian Perspective (1996)
 * Statistical inferences have two characteristics:
 * Experimental or observational evidence is available or can be gathered.
 * Conclusions are uncertain.


 * John E. Freund, Mathematical Statistics, 2nd edition (1971)
 * Statistics no longer consists merely of the collection of data and their representation in charts and tables &mdash; it is now considered to encompass not only the science of basing inferences on observed data, but the entire problem of making decisions in the face of uncertainty.


 * Gouri K. Bhattacharyya and Richard A. Johnson, Statistical Concepts and Methods (1977)
 * Statistics is a body of concepts and methods used to collect and interpret data concerning a particular area of investigation and to draw conclusions in situations where uncertainty and variation are present.


 * E. L. Lehmann, Theory of Point Estimation (1983)
 * Statistics is concerned with the collection of data and with their analysis and interpretation.


 * William H. Beyer (editor), CRC Standard Probability and Statistics Tables and Formulae (1991)
 * The pursuit of knowledge frequently involves data collection; and those responsible for the collection must appreciate the need for analyzing the data to recover and interpret the information therein. Today, statistics are being accepted as the universal language for the results of experimentation and research and the dissemination of information.


 * Oscar Kempthorne, The Design and Analysis of Experiments, reprint edition (1973)
 * Statistics enters [the scientific method] at two places:
 * The taking of observations
 * The comparison of the observations with the predictions from... theory.


 * Marvin Lentner and Thomas Bishop, Experimental Design and Analysis (1986)
 * The information obtained from planned experiments is used inductively. That is, generalizations are made about a population from information contained in a random sample of that particular population. ... [Such] inferences and decisions... are sometimes erroneous. Proper statistical analyses provide the tools for quantifying the chances of obtaining erroneous results.


 * Robert L. Mason, Richard F. Gunst and James L. Hess, Statistical Design and Analysis of Experiments (1989)
 * Statistics is the science of problem-solving in the presence of variability.


 * Statistics is a scientific discipline devoted to the drawing of valid inferences from experimental or observational data.


 * Stephen K. Campbell, Flaws and Fallacies in Statistical Thinking (1974)
 * Statistics... is a set of methods for obtaining, organizing, summarizing, presenting, and analyzing numerical facts. Usually these numerical facts represent partial rather than complete knowledge about a situation, as is the case when a sample is used in lieu of a complete census.

Population vs. sample

 * population
 * sample
 * inference
 * average
 * representative sample
 * sampling

Randomness, probability and uncertainty

 * randomness
 * unknowable
 * sample space
 * probability
 * likelihood
 * chance
 * probability zero / zero probability
 * probability one
 * uncertainty
 * relative frequency
 * trial
 * Bayesian
 * Bayesian probability
 * Bayesian statistics
 * equally likely
 * fair
 * frequentist
 * simple random sample

Prior information and loss

 * probabilistic
 * probability distribution

Sampling

 * Main article: Sampling (statistics)

Experimental design

 * Main article: Design of experiments

Data summary: descriptive statistics

 * Main article: Descriptive statistics

Levels of measurement

 * Main article: Level of measurement


 * Qualitative (categorical)
 * Nominal
 * Ordinal
 * Quantitative (numerical)
 * Interval
 * Ratio

Graphical summaries

 * Main article: ?

Numerical summaries

 * Main article: Summary statistics

Data interpretation: inferential statistics

 * Main article: Statistical inference

Estimation

 * Main article: Statistical estimation

Prediction

 * Main article: Statistical prediction

Hypothesis testing

 * Main article: Statistical hypothesis testing

Correlation

 * correlation
 * correlated
 * positively correlated
 * negatively correlated
 * interval
 * ratio
 * scatterplot
 * trend
 * ordinal
 * zero correlation / correlation zero
 * correlation one
 * perfect positive correlation, perfect negative correlation, perfect correlation
 * positive correlation, negative correlation
 * Pearson product-moment correlation coefficient / Pearson's product-moment correlation coefficient
 * Pearson correlation coefficient / Pearson's correlation coefficient
 * Pearson correlation / Pearson's correlation
 * Pearson r / Pearson's r
 * Spearman rho / Spearman's rho
 * Kendall tau / Kendall's tau
 * Yule Q / Yule's Q

Regression

 * regression

Time series

 * time series

Data mining

 * data mining

Statistical practice and methods

 * Data collection
 * Statistical planning
 * Sampling
 * Probability sampling
 * Simple random sampling
 * Systematic sampling
 * Stratified sampling
 * Cluster sampling
 * Multistage sampling
 * Non-probability sampling
 * Convenience sampling
 * Self-selective sampling (or Self selection)
 * Experimental design
 * Controlled experiment
 * Double-blind experiment
 * Data analysis
 * Descriptive statistics
 * Categorical data (or Qualitative data)
 * Univariate data
 * Pie chart
 * Bar chart
 * Pareto chart
 * Bivariate or Multivariate data
 * Contingency table (or Cross-tabulation)
 * Time series
 * Line chart (or Time series plot)
 * Numerical data (or Quantitative data)
 * Dot plot
 * Frequency table
 * Relative frequency table
 * Grouped frequency table
 * Histogram
 * Box plot
 * Statistical inference
 * Statistical estimation
 * Point estimation
 * Interval estimation
 * Hypothesis testing
 * z test
 * t test
 * F test
 * Analysis of variance (ANOVA)
 * Chi-squared test
 * Statistical modeling
 * Simple linear regression
 * Multiple regression
 * Curvilinear regression
 * Drawing conclusions

Statistics in other fields

 * Biostatistics
 * Business statistics
 * Chemometrics
 * Demography
 * Economic statistics
 * Engineering statistics
 * Epidemiology
 * Geostatistics
 * Psychometrics
 * Statistical physics

Subfields or specialties in statistics

 * Mathematical statistics
 * Reliability
 * Survival analysis
 * Quality control (or Quality assurance)
 * Time series
 * Categorical data analysis
 * Multivariate statistics
 * Large-sample theory
 * Bayesian statistics (or Bayesian inference, Bayesian analysis)
 * Regression analysis (or just Regression)
 * Sampling theory (or just Sampling)
 * Experimental design (or Design of experiments)
 * Statistical computing (or Computational statistics; see also Scientific computing)
 * Nonparametric statistics (Nonparametrics, Nonparametric inference, Nonparametric regression)
 * Density estimation
 * Simultaneous inference
 * Linear inference
 * Optimal inference
 * Decision theory (Statistical decision theory)
 * Experimental design and analysis (Experimental design, Design and analysis of experiments, Design of experiments)
 * Linear models (Linear model)
 * Multivariate analysis (Multivariate statistics)
 * Data modeling
 * Sequential analysis
 * Spatial statistics

Probability:
 * Stochastic processes
 * Queueing theory

Related areas of mathematics

 * Probability
 * Set theory
 * Finite mathematics
 * Discrete mathematics
 * Combinatorics
 * Analysis
 * Calculus
 * Real analysis
 * Integration theory
 * Measure theory
 * Probability theory
 * Distribution theory
 * Asymptotics
 * Linear algebra
 * Matrix theory
 * Numerical analysis
 * Scientific computing

Also: Statistical physics

Typical course in mathematical probability
Below are the topics typically (?) covered in a one-year course introducing the mathematical theory of probability to undergraduate students in mathematics and statistics. (Actually, this list contains much more material than is typically covered in one year.)

Topics of a more advanced nature are italicized, including those typically only covered in mathematical statistics or graduate-level probability theory courses (e.g., topics requiring measure theory). See also the below.


 * Interpretation of probability
 * Frequency interpretation of probability
 * Classical interpretation of probability
 * Subjective interpretation of probability
 * Random experiments
 * Outcomes (or Simple event)
 * Sample space
 * Events
 * Set theory
 * Universal set
 * Empty set
 * Union
 * Intersection
 * Complement
 * Mutually exclusive events (or Disjoint events)
 * Measure theory
 * Measure space
 * Borel sets
 * Sequence of sets
 * Countable union
 * Countable intersection
 * Nested sets
 * Measure
 * Definition of probability (or Probability measure)
 * Axioms of probability (or Probability axioms, or Probability axiom)
 * Properties of probability
 * General addition rule
 * Equally likely outcomes
 * Counting methods
 * Multiplication principle
 * Factorial
 * Permutation
 * Combination
 * Binomial coefficients
 * Pascal's triangle
 * Multinomial coefficients
 * Independent events
 * Joint probability
 * Marginal probability
 * Conditional probability
 * General multiplication rule
 * Law of total probability
 * Bayes' theorem
 * Prior probability
 * Posterior probability
 * Markov chain
 * Transition matrix
 * Famous problems in probability
 * Birthday problem
 * False-positive problem (or False-Positive problem?)
 * Sensitivity vs. Selectivity
 * Gambler's Ruin problem
 * Matching problem
 * Monty Hall problem
 * Prisoner's Dilemma
 * Optimal Stopping problem
 * Random variable
 * Probability distribution
 * Probability function (pf)
 * Support of a probability function
 * Discrete random variable
 * Probability mass function (pmf)
 * Continuous random variable
 * Probability density function (pdf)
 * Cumulative distribution function (cdf) (Note: Distribution function is now about physics -- df)
 * Mixed probability distribution (i.e., discrete and continuous parts &mdash; name??)
 * Distribution of a function of a random variable
 * Direct method for deriving the pdf of a function of a random variable
 * Probability integral transform (or Probability integral transformation)
 * Expectation
 * Expected value or Mean
 * Variance
 * Standard Deviation
 * Moment
 * Central moment
 * Absolute moment
 * Moment generating function (mgf)
 * Non-uniqueness of moments
 * Characteristic function (cf)
 * ...
 * Joint distribution (Joint probability distribution)
 * Joint probability mass function (Joint pmf)
 * Joint probability density function (Joint pdf)
 * Joint distribution function (Joint cdf)
 * Marginal distribution (Marginal probability distribution, Marginal density, Marginal density function, Marginal probability density function, Marginal probability mass function, Marginal distribution function, Marginal probability distribution function)
 * Independent random variables
 * Independent identically-distributed random variables (iid)
 * Random sample
 * Order statistics
 * Distribution of the sample range
 * Conditional distribution (Conditional probability distribution, Conditional density, Conditional density function, Conditional probability density function, Conditional probability mass function, Conditional distribution function, Conditional probability distribution function)
 * Hierarchical probability model
 * Borel-Kolmogorov paradox
 * Bivariate distribution
 * Multivariate distribution
 * Distribution of a function of two or more random variables
 * Jacobian
 * Distribution of a linear transformation of random variables
 * Convolution


 * List of probability distributions (or Table of probability distributions)
 * Discrete probability distributions
 * Discrete uniform distribution (Discrete-uniform distribution?)
 * Bernoulli distribution
 * Binomial distribution
 * Geometric distribution
 * Negative binomial distribution (or Pascal negative binomial distribution, Negative-binomial distribution, Pascal negative-binomial distribution)
 * Hypergeometric distribution
 * Poisson distribution
 * Zeta distribution (or Zipf distribution)
 * Continuous probability distributions (see also Sampling distributions below)
 * Uniform distribution (or Rectangular distribution)
 * Triangular distribution
 * Beta distribution
 * Exponential distribution
 * Double exponential distribution (or Double-exponential distribution, Laplace distribution)
 * Gamma distribution
 * Erlang distribution
 * Maxwell distribution
 * Weibull distribution
 * Rayleigh distribution
 * Gumbel distribution
 * Inverted gamma distribution (or Inverted-gamma distribution)
 * Normal distribution (or Gaussian distribution)
 * Standard normal distribution (or Z distribution, Z-distribution, Standard-normal distribution)
 * Lognormal distribution
 * Half-normal distribution (Half normal distribution?)
 * Cauchy distribution
 * Pareto distribution
 * Logistic distribution
 * Hyperbolic secant distribution (Hyperbolic-secant distribution)
 * Slash distribution
 * Mixture distributions (Hierarchical probability distribution?)
 * Beta-binomial distribution
 * ...

order?


 * Probability tables (e.g., Z table)
 * Upper-tail probability
 * Critical value


 * Relationships among probability distributions (List or Table...)
 * Special cases
 * ...
 * Limit relationships
 * Approximation of one distribution by another
 * Poisson approximation to the binomial
 * Normal approximation to the binomial


 * Other Properties of the cumulative distribution function
 * Memory (or Memoryless property, or whatever)
 * Hazard function
 * Stochastic order or Stochastic ordering (Stochastically greater, Stochastically smaller, Stochastically increasing, Stochastically descreasing)


 * Sampling distributions
 * Sampling distribution of the sample mean
 * Central Limit Theorem
 * t distribution (or t-distribution, Student's t distribution, Student's t-distribution)
 * Degrees of freedom (Degree of freedom)
 * Sampling distribution of the sample variance
 * Chi-squared distribution (Chi-square distribution, Chi squared distribution)
 * F distribution (F-distribution)


 * Family of probability distributions (or Probability distribution family, Distribution family, etc.?)
 * Location-scale family
 * Exponential family
 * Exponential power family
 * Simulation (Generating random numbers with a given distribution, Generating random observations, Generating random numbers, Generating observations from a probability distribution, Generating observations on a random variable, etc. &mdash; "pseudo-random" on all these, too)
 * Pseudorandom numbers (see Pseudorandom, Pseudo-random number, Pseudo-random)
 * Random number table (and Table of random digits &mdash; former is how to use, latter an actual table)
 * Pseudorandom variables (Pseudo-random variable)


 * ...


 * And so on, and so forth...

Typical course in mathematical statistics
Would cover many of the topics from the outlined above, plus...


 * ...


 * And so on, and so forth...

Typical course in applied statistics
Less theoretical than the outlined above. (Sometimes portions of the following form the basis of a second statistics course for mathematics majors &mdash; third in the sequence if probability is the first course).


 * Statistical charts
 * Frequency distribution (Relative..., Cumulative..., Grouped...)
 * Stem-and-leaf display (Stem and leaf display, Stem-and-leaf diagram, Stem and leaf diagram, Stem and leaf)
 * Contingency table
 * Statistical plots (Statistical graphs)
 * Bar chart
 * Pareto plot
 * Pie chart (Pie graph)
 * Line plot (Line graph)
 * Time plot
 * Frequency polygon
 * Ogive
 * Histogram
 * Box plot (Box-and-whisker plot, Box and whisker plot)
 * Quantile-quantile plot (Q-Q plot, Q-q plot)
 * Chernoff face


 * ...


 * List of experimental designs
 * Completely randomized design (CR design, CR)
 * Randomized block design (RB design, RB)
 * Randomized complete block design (RCB design, RCB)
 * Latin square design (LS design, LS)
 * Graeco-Latin square design
 * Crossover design
 * Repeated Latin square design (RLS design, RLS)
 * Factorial design
 * Knut Vik square design
 * Hierarchically nested design
 * Split-plot design (SP design, SP)
 * Split-block design
 * Split-split-plot design
 * Quasifactorial design
 * Lattice design
 * Incomplete block design (IB design, IB)
 * Fractional factorial design
 * Fractional-replication design
 * Half replicate design
 * Half fraction of a factorial design
 * Completely balanced lattice design
 * Rectangular lattice design
 * Triple rectangular lattice design
 * Balanced incomplete block design (BIB design, BIB)
 * Cyclic design
 * Alpha-design ("&alpha;-design")
 * Incomplete Latin square design
 * Youden square design
 * Partially balanced incomplete block design (PBIB design, PBIB)
 * Repeated measures design


 * ...


 * And so on, and so forth...

Bayesian anaylsis
Hmm...

Terms from categorical data analysis
(By chapter: Agresti, 1990.)


 * 1) (none)
 * 2) contingency table, two-way table, two-way contingency table, cross-classification table, cross-tabulation, relative risk, odds ratio, concordant pair, discordant pair, gamma, Yule's Q, Goodman and Kruskal's tau, concentration coefficient, Kendall's tau-b, Sommer's d, proportional prediction, proportional prediction rule, uncertainty coefficient, Gini concentration, entropy (variation measure), tetrachoric correlation, contingency coefficient, Pearson's contingency coefficient, log odds ratio, cumulative odds ratio, Goodman and Kruskal's lambda, observed frequency
 * 3) expected frequency, independent multinomial sampling, product multinomial sampling, overdispersion, chi-squared goodness-of-fit test, goodness-of-fit test, Pearson's chi-squared statistic, likelihood-ratio chi-squared statistic, partitioning chi-squared, Fisher's exact test, multiple hypergeometric distribution, Freeman-Halton p-value, phi-squared, power divergence statistic, minimum discrimination information statistic, Neyman modified chi-squared, Freeman-Tukey statistic, ...

Statistical software
List of statistical software or List of statistical software packages...

Commercial

 * CART
 * ECHIPS (EChips)
 * Excel
 * add-ins: Analyse-It, SigmaXL, statistiXL, WinSTAT, XLSTAT (XLSTAT)
 * JMP
 * Minitab
 * NCSS
 * nQuery
 * PASS
 * SAS System (SAS)
 * S
 * descendents: S-PLUS (S-Plus), S2, S3, S4, S5, S6
 * SPSS
 * Stata
 * STATISTICA (Statistica)
 * StatXact, LogXact
 * SUDAAN (Sudaan)
 * SYSTAT (Systat)

Free versions of commercial software

 * Gnumeric &mdash; not a clone of Excel, but implements many of the same functions (can it use Excel add-ins?)
 * R &mdash; free version of S
 * FIASCO or PSPP &mdash; free version of SPSS

Other free software

 * BUGS &mdash; Bayesian inference Using Gibbs Sampling
 * ESS &mdash; a GNU Emacs add-on
 * ...
 * see http://www.psychnet-uk.com/experimental_design/software_packages.htm

Licensing unknown

 * Genstat
 * XLispStat
 * ...

World Wide Web

 * StatLib &mdash; large repository of statistical software and data sets

Online sources of data

 * StatLib

External link

 * StatLib

And eventually...

 * Berger, James O. (1985). Statistical Decision Theory and Bayesian Analysis (2nd ed.). NY: Springer-Verlag. ISBN 0-387-96098-8. (Also, Berlin: ISBN 3-540-96098-8.)
 * Berry, Donald A. (1996). Statistics: A Bayesian Perspective. Belmont, CA: Duxbury Press. ISBN 0-534-23472-0.
 * Feller, William (1950). An Introduction to Probability Theory and Its Applications, Vol. 1. NY: John Wiley & Sons. ISBN unknown. (Current: 3rd ed., 1968, NY: John Wiley & Sons, ISBN 0-471-25708-7.)
 * Feller, William (1971). An Introduction to Probability Theory and Its Applications, Vol. 2 (2nd ed.). NY: John Wiley & Sons. ISBN 0-471-25709-5.
 * Lehmann E. L. [Eric Leo] (1991). Theory of Point Estimation. Pacific Grove, CA: Wadsworth & Brooks/Cole. ISBN 0-534-15978-8. (Orig. 1983, NY: John Wiley & Sons.)
 * Lehmann E. L. [Eric Leo] (1994). Testing Statistical Hypotheses (2nd ed.). NY: Chapman & Hall. ISBN 0-412-05321-7. (Orig. 2nd ed., 1986, NY: John Wiley & Sons.)