The Bayesian vs frequentist approaches: implications for machine learning – Part two

This blog is the second part in a series. The first part is The Bayesian vs frequentist approaches: implications for machine learning – Part One

In part one, we summarized that:

There are three key points to remember when discussing the frequentist v.s. the Bayesian philosophies.

  • The first, which we already mentioned, Bayesians assign probability to a specific outcome.
  • Secondly, Bayesian inference yields probability distributions while frequentist inference focusses on point estimates.
  • Finally, in Bayesian statistics, parameters are assigned a probability whereas in the frequentist approach, the parameters are fixed. Thus, in frequentist statistics, we take random samples from the population and aim to find a set of fixed parameters that correspond to the underlying distribution that generated the data. In contrast for Bayesian statistics, we take the entire data and aim to find the parameters of the distribution that generated the data but we consider these parameters as probabilities i.e. not fixed.


And also that:

Frequentists use three ideas to understand uncertainty i.e. null hypothesis, p-values and confidence intervals – which come broadly under statistical hypothesis testing for frequentist approaches.

Based on this background, we now explore the use of frequentist and Bayesian techniques in machine learning

Sampled from a distribution: Many machine learning algorithms make assumptions that the data is sampled from a frequency. For example, linear regression assumes gaussian distribution and logistic regression assumes that the data is sampled from a Bernoulli distribution. Hence, these algorithms take a frequentist approach

MLE vs MAP: Frequentists use maximum likelihood estimation (MLE) to obtain a point estimation of the parameters. Bayesians use Bayes’ formula to obtain the full posterior distribution. In this context, the posterior probability represents the conditional probability after the relevant evidence is taken into account. Based on the Bayesian theorem, the maximum a posteriori probability (MAPestimate can be used to obtain a point estimate similar to the MLE.  

Statistical paradigm: Both Bayesian and frequentist approaches are statistical paradigms and as such are used in multiple ways for the machine learning pipeline. For example in Bayesian hyperparameter optimization and Bayesian model selection

Small data and uncommon events Bayesian techniques are better suited when you are dealing with smaller data sizes, uncommon events

Generative classifiers: Naïve Bayes classifiers is a generative classifier based on the Bayes theorem. Assuming that the predictors are independent, Naïve Bayes performs well. See Naïve Bayes vs Logistic Regression

Unsupervised deep learning techniques like variational autoencoders can be understood as Bayesian inference problems

Markov Chain Monte Carlo (MCMC) techniques: Bayesian inference problems can sometimes be intractable. Methods like Markov Chain Monte Carlo (MCMC) can be used to implement a solution as a Bayesian inference problem.


Final notes

  • The actual implementation of algorithms in APIs may include multiple techniques
  • The discussion of frequentist vs Bayesian is only for model building and not for inference i.e. once a


In subsequent parts of this blog we will explore the difference between frequentist and Bayesian from the standpoint of parameterized models and statistical inference

Image source: