add data to try binder

62136063 · Patricia Schmidt · 462af769 · 62136063 · 62136063
Commit 62136063 authored Aug 10, 2020 by Patricia Schmidt
Hide whitespace changes
Inline Side-by-side

Showing with 602 additions and 0 deletions

Fair_Coin_Example.ipynb Fair_Coin_Example.ipynb +254 -0

coin.py coin.py +348 -0

No files found.
--- a/Fair_Coin_Example.ipynb
+++ b/Fair_Coin_Example.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# The Fair Coin\n",
+    "\n",
+    "An introductory example of parameter estimation using Bayesian inference following Example 1 in Chapter 2 of \"Data Analysis\" by D. S. Sivia.\n",
+    "\n",
+    "Author: Patricia Schmidt (pschmidt@star.sr.bham.ac.uk)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Instructions\n",
+    "\n",
+    "If you have never used a Jupyter notebook before, don't worry!\n",
+    "\n",
+    "This is an interactive notebook that allows you evaluate the cells as you go along. To evaluate a cell, select it and hit the play button on the left or press \"shift+return\". "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# EVALUATE ME\n",
+    "\n",
+    "!git clone https://github.com/PatriciaSchmidt/Microteaching.git\n",
+    "!mv Microteaching/coin.py coin.py\n",
+    "\n",
+    "%matplotlib inline\n",
+    "\n",
+    "import coin"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Activity 1: Is your coin biased?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's find out if your coin is biased! \n",
+    "\n",
+    "To do so, simply replace my data, i.e. the result of my coin flips, with yours and re-evaluate the cell. Remember that $F=0$ means tail and $F=1$ means head, so each time you flipped a head, enter a 1 and each time you had a tail, enter a 0.\n",
+    "\n",
+    " * The black dashed line indicates $F=0.5$ -- the fair coin.\n",
+    " * The blue solid curve is the calculated posterior probability.\n",
+    " * The blue vertical line indicates the maximum posterior probability of the fairness of your coin.\n",
+    "\n",
+    "Let's go!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# EVALUATE ME!\n",
+    "\n",
+    "# These are the results from my first coin flip.\n",
+    "# Simply replace my sequence of 0s and 1s with yours and find out if your coin is fair!\n",
+    "\n",
+    "my_results1 = [0,1,1,1,0,0,1,0,1,0]\n",
+    "\n",
+    "coin.calculate_fairness(my_results1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# EVALUATE ME!\n",
+    "\n",
+    "# These are the results from my second coin flip, after I attached some Blu Tack ton one side of the coin.\n",
+    "# Let's find out whether or not this biased my coin!\n",
+    "\n",
+    "my_results2 = [1,1,1,0,0,1,0,1,1,0]\n",
+    "\n",
+    "coin.calculate_fairness(my_results2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As you can see, while my 100 Yen coin is fair, it definitely was biased towards tails with a fairness of $F \\sim 0.6$ after I attached some Blu Tack. \n",
+    "\n",
+    "What about yours? You could also try different types of coins or vary the number of tosses."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Activity 2: Does the result depend on the number of coin flips?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We have only done 10 tosses. Is this a large enough number of tosses aka \"sample size\" or do we need to do more flips to be sure? Could we falsely mistake a truly fair coin for a biased one or vice versa when only doing a small number of tosses? Let's see if the result changes as we vary the number of tosses -- simply evaluate the cell below.\n",
+    "\n",
+    "In this activity we assume that our coin is unbiased, i.e. the true fairness value of our coin $F$ is $0.5$. \n",
+    "You can repeat this experiment assuming that your coin is biased by passing the argument \"pheads=\" a value between 0 and 1 as in the second example. Test this option and note down what you observe.\n",
+    "\n",
+    " * The black dashed line indicates $F=0.5$ -- the fair coin.\n",
+    " * The blue solid curve is the calculated posterior probability for each number of tosses.\n",
+    " * The blue vertical line indicates the maximum posterior probability of the fairness of your coin."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# EVALUATE ME!\n",
+    "\n",
+    "# N in the corner of each plot indicates the number of coin flips\n",
+    "\n",
+    "coin.vary_tosses()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# EVALUATE ME!\n",
+    "\n",
+    "# N in the corner of each plot indicates the number of coin flips\n",
+    "\n",
+    "coin.vary_tosses(pheads=0.8)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Activity 3: What about the prior?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "So far we have used a uniform prior, i.e. we assumed a priori that all values of fairness $F \\in [0.1]$ are equally probable. How would our inference about the fairness of the coin have changed if we had chosen a different prior? Let's find out!\n",
+    "\n",
+    "Instead of a uniform prior, we could for example have chosen a normal (Gaussian) distribution peaked at $F=0.5$ with a variance of 0.05, which reflects our background information that most coins are fair.\n",
+    "\n",
+    "Does the result change?\n",
+    "\n",
+    " * The black dashed line indicates $F=0.5$ -- the fair coin.\n",
+    " * The blue solid curve is the calculated posterior probability for each number of tosses.\n",
+    " * The blue vertical line indicates the maximum posterior probability of the fairness of your coin."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# EVALUATE ME!\n",
+    "\n",
+    "# Using the measurements from your coin flip (see Activity 1), \n",
+    "# let's find out how fair your coin is if we assume a Gaussian prior instead of a uniform one!\n",
+    "\n",
+    "coin.calculate_fairness_Gaussian(my_results1)\n",
+    "coin.calculate_fairness_Gaussian(my_results2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Is the outcome the same as in Activity 1 or do you get a different result? \n",
+    "\n",
+    "Previously, we have already seen that the number of tosses $N$ plays an important role and that our results may not give the true answer if our sample size is too small. So let's increase the number of tosses as we did in Activity 2 but let's now use the Gaussian prior. Do you observe any differences?\n",
+    "\n",
+    " * The black dashed line indicates $F=0.5$ -- the fair coin.\n",
+    " * The blue solid curve is the calculated posterior probability assuming a uniform prior on $F$.\n",
+    " * The red dashed curve is the calculated posterior probability assuming a Gaussian prior peaked at $F=0.5$ on $F$.\n",
+    " * The vertical lines indicate the maximum posterior probabilities for the corresponding posterior distributions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's see what we get if our true coin is unbiased:\n",
+    "\n",
+    "coin.vary_tosses_Gaussian()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's see what happens if our coin is biased heavily towards tails.\n",
+    "# In our example the true rate of heads is 1 head in 10 tosses.\n",
+    "\n",
+    "coin.vary_tosses_Gaussian(pheads=0.1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If you compare the results, what do you notice about the impact of the prior? Does one prior allow you to reach the correct result quicker (i.e. a lower number of tosses) than the other? Test this also for differently biased coins by varying \"pheads=\". "
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/coin.py
+++ b/coin.py
+# -*- coding: utf-8 -*-
+#
+#       Copyright 2020
+#       Patricia Schmidt <pschmidt@star.sr.bham.ac.uk>
+#
+#       This program is free software; you can redistribute it and/or modify
+#       it under the terms of the GNU General Public License as published by
+#       the Free Software Foundation; either version 2 of the License, or
+#       (at your option) any later version.
+#
+#       This program is distributed in the hope that it will be useful,
+#       but WITHOUT ANY WARRANTY; without even the implied warranty of
+#       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#       GNU General Public License for more details.
+#
+#       You should have received a copy of the GNU General Public License
+#       along with this program; if not, write to the Free Software
+#       Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+#       MA 02110-1301, USA.
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy import stats
+def generate_HT(pheads=0.5):
+    '''
+    Function to generate a random number to represent head (H=1) or tail (H=0) in a coin toss. 
+    Through the parameter pheads in [0.0, 1.0] you can specify how biased you want your coin to be. 
+    If pheads = 0, you have a totally biased coin in favour of tails; if pheads = 1.0 you have a 
+    totally biased coin in favour of heads.
+    '''
+    # Distribution with heads only
+    if pheads == 1.0:
+        rnd = 1;
+    # Distribution with tails only
+    if pheads == 0.0:
+        rnd = 0;
+    # A mixed distribution 
+    if 0.0 < pheads < 1.0:
+        rnd = np.random.random()
+        if(rnd < pheads):
+            rnd = 1
+        else:
+            rnd = 0
+    if rnd > 1:
+        return 0
+    else:
+        return rnd
+def simulation(N=100, pheads=0.5):
+    '''
+    This function simulates N tosses of a coin. You can bias your coin by specifyin pheads in the range
+    0.0 to 1.0. 
+    The function returns:
+        X: The number of heads obtained in N tosses
+        T = N-X: The number of tails on N tosses
+        outcomes: A list containing all outcomes in sequential order
+    '''
+    outcomes = [];
+    X = 0;
+    for i in range(N):
+        out = generate_HT(pheads);
+        if out == 1:
+            X += 1
+        outcomes.append(out);
+    T = N-X;
+    return X, T, outcomes
+def plot_posterior(X, T):
+    '''
+    Function to plot the posterior distribution.
+    The assumption here is that each toss is independent of the others, and we use a binomial to model 
+    the likelihood function. Because we don't know have any idea of the fairness of the coin, we just
+    use a uniform distribution to model the prior. 
+    X: Number of heads in N tosses
+    T: Number of tails in N tosses
+    '''
+    # F is our measure of fairness, which can take values between 0 and 1
+    F = np.linspace(1e-6,1.-1e-6,500)
+    # Calculate the posterior probability of F assuming a uniform prior
+    lny = X*np.log(F) + T*np.log(1-F)
+    if len(lny) > 1:
+        lny = lny - np.max(lny)
+    y = np.exp(lny)
+    # Normalise the distribution
+    y = y/np.sum(y);
+    # maximum a-posteriori value of F
+    idx = np.argmax([y]);
+    print(F[idx]);
+    plt.plot(F, y, c='blue')
+    plt.xlabel('Fairness F', fontsize = 14)
+    plt.ylabel('Probability(F)', fontsize = 14)
+    plt.axvline(0.5, ls='--', c='k')
+    plt.axvline(F[idx], ls='--', c='blue')
+    plt.title(str(X) + ' heads, ' + str(T) + ' tails, total ' + str(X+T) +' tosses')
+    plt.show()
+def calculate_fairness(data):
+    '''
+    Function to calculate the fairness F of a coin given some data, e.g. the result
+    of your N coin flips, assuming a uniform prior on F.
+    It returns the posterior probability density of F and the maximum a-posteriori value.
+    '''
+    # Number of heads in my data:
+    X = data.count(1)
+    # Number of tails:
+    T = len(data)-X
+    # The range of our fairness parameter H:
+    F = np.linspace(1e-6,1.-1e-6,500)
+    lny = X*np.log(F) + T*np.log(1-F)
+    if len(lny) > 1:
+        lny = lny - np.max(lny)    
+        y = np.exp(lny)
+    # Normalise the distribution
+    y = y/np.sum(y);
+    # maximum a-posteriori value of H
+    idx = np.argmax([y]);
+    print("The maximum posterior probability of the fairness of your coin is: %f"%F[idx]);
+    plt.plot(F, y, c='blue')
+    plt.xlabel('Fairness F', fontsize = 14)
+    plt.ylabel('Probability(F)', fontsize = 14)
+    plt.axvline(0.5, ls='--', c='k')
+    plt.axvline(F[idx], c='blue')
+    plt.title(str(X) + ' heads, ' + str(T) + ' tails, total ' + str(X+T) +' tosses')
+    plt.show()
+def calculate_fairness_Gaussian(data):
+    '''
+    Function to calculate the fairness F of a coin given some data, e.g. the result
+    of your N coin flips, assuming a Gaussian prior on F peaked at 0.5.
+    It returns the posterior probability density of F and the maximum a-posteriori value.
+    '''
+    # Number of heads in my data:
+    X = data.count(1)
+    # Number of tails:
+    T = len(data)-X
+    # The range of our fairness parameter H:
+    F = np.linspace(1e-6,1.-1e-6,500)
+    # Gaussian prior for the fairness:
+    prior = stats.norm.pdf(F, loc=0.5, scale=0.05)
+    lny = X*np.log(F) + T*np.log(1-F) + np.log(prior)
+    if len(lny) > 1:
+        lny = lny - np.max(lny)    
+        y = np.exp(lny)
+    # Normalise the distribution
+    y = y/np.sum(y);
+    # maximum a-posteriori value of F
+    idx = np.argmax([y]);
+    print("The maximum posterior probability of the fairness of your coin is: F=%f"%F[idx]);
+    plt.plot(F, y, c='blue')
+    plt.xlabel('Fairness F', fontsize = 14)
+    plt.ylabel('Probability(F)', fontsize = 14)
+    plt.axvline(0.5, ls='--', c='k')
+    plt.axvline(F[idx], c='blue')
+    plt.title(str(X) + ' heads, ' + str(T) + ' tails, total ' + str(X+T) +' tosses')
+    plt.show()
+def vary_tosses(pheads=0.5):
+    '''
+    Function to compute the posterior probability of N coin tosses, where N is varied between
+    1 and 4096. We assume a uniform prior on the fairness F.
+    '''
+    fig = plt.figure(figsize = (15,15))
+    n_tosses = [0, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096]
+    F = np.linspace(1e-6,1.-1e-6,500)
+    X_4096, T_4096, result_4096 = simulation(N=4096, pheads=pheads)
+    for i, toss in enumerate(n_tosses):
+        # Get the result up to N=toss
+        result = result_4096[0:toss]
+        # Get the number of heads
+        X = result.count(1)
+        # Get the number of tails
+        T = toss - X
+        # Calculate the posterior probability of F
+        lny = X*np.log(F) + T*np.log(1-F)
+        if len(lny) > 1:
+            lny = lny - np.max(lny)
+        y = np.exp(lny)
+        # Normalise the distribution
+        y = y/np.sum(y);
+        #plot the pdf
+        ax = plt.subplot(5, 3, i+1)
+        ax.plot(F, y, label='N=%s'%toss, c='blue')
+        ax.legend(loc='best')
+        #disable y axis label
+        #ax.yaxis.set_visible(False)
+        # Setting the x-axis major tick's label
+        ax.set_xticks([0, 0.25, 0.5, 0.75, 1])
+        ax.set_xticklabels(['0','','0.5','','1'])
+        plt.axvline(0.5, ls='--', c='k')
+        ax.set_xlabel('Fairness F', fontsize = 14)
+        ax.set_ylabel('Probability(F)', fontsize = 14)     
+        if pheads > 0.0:
+            idx = np.argmax([y]);
+            plt.axvline(F[idx], c='blue')
+    plt.tight_layout()
+    plt.show()
+def vary_tosses_Gaussian(pheads=0.5):
+    '''
+    Function to compute the posterior probability of N coin tosses, where N is varied between
+    1 and 4096. We assume a Gaussion prior peaked at 0.5 on the fairness F.
+    '''
+    fig = plt.figure(figsize = (15,15))
+    n_tosses = [0, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096]
+    F = np.linspace(1e-6,1.-1e-6,500)
+    X_4096, T_4096, result_4096 = simulation(N=4096, pheads=pheads)
+    for i, toss in enumerate(n_tosses):
+        # Get the result up to N=toss
+        result = result_4096[0:toss]
+        # Get the number of heads
+        X = result.count(1)
+        # Get the number of tails
+        T = toss - X
+        # Calculate the posterior probability of F using a uniform prior:
+        lny = X*np.log(F) + T*np.log(1-F)
+        if len(lny) > 1:
+            lny = lny - np.max(lny)
+        y = np.exp(lny)
+        # Normalise the distribution
+        y = y/np.sum(y);
+        # Calculate the posterior probability of F using a Gaussian prior:
+        prior = stats.norm.pdf(F, loc=0.5, scale=0.05)
+        lnyG = X*np.log(F) + T*np.log(1-F) + np.log(prior)
+        if len(lnyG) > 1:
+            lnyG = lnyG - np.max(lnyG)
+        yG = np.exp(lnyG)
+        # Normalise the distribution
+        yG = yG/np.sum(yG);
+        #plot the pdf
+        ax = plt.subplot(5, 3, i+1)
+        ax.plot(F, y, label='N=%s'%toss, c='blue')
+        ax.plot(F, yG, ls='--', c='red')
+        ax.legend(loc='best')
+        #disable y axis label
+        #ax.yaxis.set_visible(False)
+        # Setting the x-axis major tick's label
+        ax.set_xticks([0, 0.25, 0.5, 0.75, 1])
+        ax.set_xticklabels(['0','','0.5','','1'])
+        ax.set_xlabel('Fairness F', fontsize = 14)
+        ax.set_ylabel('Probability(F)', fontsize = 14)
+        plt.axvline(0.5, ls='--', c='k')     
+        if pheads > 0.0:
+            idx = np.argmax([y]);
+            idxG = np.argmax([yG]);
+            plt.axvline(F[idx], c='blue')
+            plt.axvline(F[idxG], ls='--', c='red')
+    plt.tight_layout()
+    plt.show()