In this notebook, we will see how to correctly estimate variance and standard deviation from a sample drawn from a normal distribution. This notebook is inspired from one of the videos of Joshua Starmer.

# Let's first load our numerical Python library
import numpy as np
# Get a normal distribution with mean 50 and standard deviation of 10. 
population = np.random.normal(50,10,1000) 
print(population)
[58.51096494 54.35604739 31.33596155 36.57228386 42.90854289 36.97196961
 35.13454667 55.25576274 51.56395122 41.97458864 50.27704343 36.86389636
 53.38015128 62.93762022 57.42527844 58.04615529 60.57570213 52.1408575
 34.20236425 45.22173563 46.13978888 59.43242208 33.51554549 56.62330495
 54.75329618 49.36186641 38.83343647 47.47763538 46.91126975 47.29980537
 52.23450113 44.24877717 59.46549651 61.70291118 59.4483032  41.857741
 45.41888587 53.85452025 42.36669454 35.65081016 48.04063376 63.99663189
 40.68565984 51.37703171 54.4172925  57.46578844 46.27661915 58.03179744
 62.51838416 62.98572275 51.52055047 19.66066686 43.34078901 64.6356989
 52.92457845 64.06370046 42.01320803 50.467454   58.05187468 54.09740891
 52.16034583 68.57232819 40.78513094 43.75226304 41.93004599 57.26628367
 55.34516409 42.14888487 52.0272703  37.37381092 52.45937831 42.11467659
 40.79862197 55.38373167 52.52087153 66.45867675 54.12912889 47.80582615
 43.41857924 57.36261701 70.58417248 42.06219855 53.07191289 63.76790345
 46.75568293 46.93813111 17.66432128 36.75023492 56.27848596 54.05479092
 63.66499998 46.75504911 34.00748939 42.38814276 46.74203582 60.18576971
 42.05833903 51.71169961 44.59957632 46.92590804 45.94141012 38.65755598
 44.14958657 49.5389402  47.35966325 51.29152648 58.85338909 63.50920556
 49.93560833 62.66415662 31.71489886 42.77585097 41.32468306 45.55556494
 65.10974201 40.89232198 47.5511002  56.43909899 49.48381223 52.91343712
 39.47421772 67.77739403 60.72900523 39.97829149 68.06881237 59.79660531
 31.65758034 55.07219281 42.92833436 52.06491723 30.060851   52.37925578
 54.39871949 44.92111894 64.13093339 62.11641469 46.38508628 57.36817257
 58.26177699 46.88680173 36.21335945 43.9322849  58.3311759  43.91176886
 52.41432636 50.83977639 59.85490113 46.05610976 71.45861594 52.22685539
 44.00201372 57.32445064 62.00719335 58.69462172 49.27914532 57.00415921
 49.93580194 55.37582015 47.23973518 59.21763863 53.6469019  45.13558853
 46.0037842  52.27731902 47.23756395 53.69869932 51.14977146 68.37962671
 64.03591208 45.11426445 49.30640391 48.08267892 45.8729829  75.53392551
 57.18522264 58.25322278 49.75479668 46.22039597 45.2445972  41.5284812
 48.75887917 61.17602989 53.42484972 34.0726375  55.40448624 60.16040834
 37.9460503  61.2902726  39.8750727  38.44802569 34.24799012 44.37329522
 77.40004751 55.9564813  43.84367124 43.36344487 55.3518386  61.78866091
 46.05348253 48.50032037 51.17089634 50.58437475 43.98087643 60.00187036
 70.02485312 37.96331323 51.95699686 51.62269832 50.33664439 42.96724272
 45.41090945 50.45388463 64.62563524 38.51322043 58.40053183 57.30300646
 41.37851578 36.46997478 60.84485799 60.13407462 48.91140784 59.20193324
 50.68985991 44.72945308 44.56929802 42.93491369 70.03763053 37.73617426
 52.23308832 53.54772195 46.02820808 42.88867465 34.0681359  48.03719508
 55.00216931 50.62007321 51.22324954 60.74038459 42.32602319 60.25148338
 40.42599181 44.96140085 42.53408429 64.78558745 47.35716503 46.39926564
 60.21675031 50.0721944  49.21114484 33.71666446 38.45886166 60.46614971
 35.73292904 47.41651592 44.43813335 46.28166266 48.11111347 52.89035593
 28.35521275 40.20776218 43.62759573 49.72344355 54.1621804  44.36229114
 52.4634741  57.31953108 51.31100929 32.41471191 43.11952783 44.77520469
 45.14186805 41.93922943 32.59035521 53.5145599  33.18437994 58.49611286
 55.20562616 63.29991818 48.02451593 79.47850982 50.7029197  50.4023675
 71.05384364 48.29939333 55.70469834 33.49783818 41.8691827  41.7012603
 45.53307266 47.29138402 64.13354316 47.12574256 70.20041752 49.66024371
 49.559411   58.1178099  45.34679762 59.64728617 49.09077302 51.47556396
 44.45338838 43.34634532 58.35770115 34.07760973 48.66480454 44.74156175
 41.08820637 47.11750917 42.14862141 54.01841966 60.48067793 38.76461905
 41.39985613 45.95136909 49.17338084 54.32134154 41.2009738  62.68308551
 47.36324522 75.88759778 27.50563669 36.18401295 58.60229513 54.39563868
 67.04009668 57.01896297 51.01012555 51.49530581 50.47816671 42.03828336
 53.12659145 53.51307739 57.56261829 43.2908397  46.68623358 52.84365012
 49.15381598 63.9502998  33.50141534 53.71450471 63.43468185 47.82661348
 43.05471704 59.39932313 48.81512239 43.76293699 68.24340294 61.09774568
 30.35784896 49.70337575 57.74451337 59.97710016 55.21639336 52.86200624
 27.2974423  65.4382073  50.4260714  48.92221927 51.69491938 45.00160552
 54.84922677 51.69075166 68.42347898 43.95118525 44.68437386 54.56276934
 32.26022757 64.45378105 30.10098436 50.25778809 48.07727539 35.64012966
 41.04297655 43.67350023 34.57500939 41.81590636 44.01680877 39.11304667
 59.1931689  52.37711148 52.02121863 29.03945856 52.33524351 49.86704654
 46.2931372  60.58074297 50.9950636  69.09004623 58.70462633 42.27650569
 44.36170478 42.93534129 39.302986   38.09897613 44.59472599 63.39582636
 46.28641522 41.57425464 36.06785033 39.52303357 56.43135697 55.96691894
 65.21291677 61.74007416 45.13123978 56.13100022 61.51559825 31.7873638
 45.76611308 53.15108226 52.77109891 53.15575295 29.77157794 43.08954062
 47.99506935 55.61867997 46.97201937 28.43563639 53.43101445 44.16249027
 53.98543909 67.13787698 50.87926553 56.46063134 58.07790144 48.52877863
 55.78998749 64.06619516 43.98201502 61.82247295 36.16768478 52.51228381
 57.9427833  42.11730039 43.68763634 63.96317807 59.9947685  69.34439481
 40.59009288 56.08872209 38.84294418 45.45856975 48.55267188 41.71169287
 39.92883414 69.20615648 47.38047677 61.45942241 45.29114806 56.53572208
 59.83105264 40.85755755 52.5204959  44.0065352  38.0946745  45.02581979
 61.80088726 46.01780949 62.17877112 50.13034337 51.34919083 59.41767124
 49.77341745 59.63220659 49.56469068 57.68744576 61.51253135 47.27587012
 53.56287571 45.02683003 38.330385   43.35626584 29.47502671 40.42516975
 54.9226395  42.28335826 52.87229012 62.38712988 59.69950233 40.42411118
 31.48175516 49.67783827 33.89851448 52.28717948 44.96659197 51.47590277
 34.93850244 57.73566694 60.89800716 44.17224126 63.08746073 60.37272909
 45.13467748 51.69210213 36.6363108  55.31960621 34.36765993 50.03104753
 50.78958667 42.60151433 64.0133207  57.22107613 43.95147554 62.71201309
 42.07814405 35.73282622 46.17090501 48.78775769 52.90342725 61.80216789
 57.9474729  49.38958485 37.87413512 46.4268394  53.57627453 57.60654114
 67.76153181 55.79113521 49.61178061 46.43990818 48.4825442  55.34829485
 48.72903894 48.19268357 48.50917974 51.24470476 47.46879236 47.21829246
 49.32277011 49.35807704 42.51136543 45.44776367 34.77563589 54.49732945
 55.89929189 71.57775511 55.8249279  54.18766579 49.54094994 51.21227795
 47.97257991 65.31360263 45.55497523 61.67714986 45.35154717 64.04770798
 46.29995602 48.06598384 33.56169626 63.54680926 55.77555952 56.11513685
 47.47263226 54.24136662 43.45953839 39.34363329 52.58076217 61.01586647
 48.15022444 24.83996489 64.84617011 41.0790905  45.47235289 60.02259144
 61.63534214 52.97495023 60.98821853 36.92861611 33.75849514 60.33437217
 64.35735401 45.32146583 54.7893578  71.75107078 52.82198035 41.7601234
 48.37888468 45.1002055  32.06683476 33.82193308 51.49171961 42.39942339
 46.70529199 40.84289176 51.87811132 45.09414022 45.42296943 58.75015627
 54.62838222 55.46469676 40.7269237  62.6759042  34.02031893 45.21648797
 66.09646281 52.86228892 39.81338101 49.8560803  63.13005886 47.86277289
 51.18245403 50.57558467 49.96991612 36.96218052 35.38406275 49.44725482
 46.2311058  59.04350192 54.79755392 38.37738144 35.09848153 41.4762303
 67.91539198 46.47579071 53.50165082 43.5359611  72.14400855 25.59321279
 50.91693391 42.10581037 58.57708552 82.73689794 30.2631204  45.52738709
 58.47952795 34.17603754 45.05027994 46.36174979 46.23891291 57.98396497
 48.13482296 56.60699383 53.57498726 58.9369787  54.06616216 45.14717773
 44.90942718 34.64623799 52.76597582 68.23797452 35.949395   37.09664461
 42.75728407 46.75306273 46.41234785 57.79683466 56.59639875 40.1817262
 48.27128178 59.54665345 53.03297887 52.81556512 36.75657142 50.1200687
 51.21060102 31.35632324 54.97554183 42.84699006 59.99811637 57.83293527
 28.03294696 37.41765319 53.60018161 52.05627379 54.12906161 58.88384089
 62.65427574 46.49585712 56.36830463 50.49391288 39.02786219 45.87895296
 75.67791255 50.52360876 48.55892073 59.14293036 62.85863397 32.41366531
 55.07957771 54.33498896 48.55870297 38.49246238 57.78413653 42.76000489
 40.92769185 39.48698616 66.72493127 54.31231549 62.7552338  48.64071963
 48.39254684 65.67570385 54.65836525 61.23394683 33.73537958 57.80757459
 56.5764356  47.8983212  55.79906532 53.09176105 50.35454496 40.37617177
 60.50239307 58.84959494 39.15484227 52.22078302 46.57422764 73.84958389
 55.50118364 51.18285875 60.05236918 56.03054919 37.62677643 51.87882465
 49.33680259 61.10397031 48.12217008 47.13187624 49.69950875 64.69869988
 47.80308841 44.83211063 55.66044527 55.07855233 60.23199381 38.81732902
 50.53567464 47.55353897 51.97006232 52.98275675 43.79122873 35.66346247
 62.55868663 57.87154779 55.87487594 59.03294044 47.33728847 56.81016574
 48.28954381 65.08284945 36.61028276 49.02066382 51.70056191 64.86318625
 61.24854936 83.69818858 40.53285693 59.30584708 72.49671145 54.50743497
 48.84209337 56.3384186  56.67930138 51.97511181 33.06847685 36.38508403
 61.11913872 47.84623061 41.69338861 38.52841889 43.81601764 52.51927661
 63.6331492  50.3647831  51.23634381 73.62692328 39.72300483 37.81092441
 52.41711418 68.69280584 70.3696588  36.91871383 68.75869397 46.93866142
 44.42375726 43.00603905 46.05819136 33.21286909 76.194464   44.06256157
 28.26897732 45.34795379 45.48771541 47.72259469 51.88006535 42.12853507
 33.49825649 59.71215709 61.02741193 53.20963859 39.37945374 54.90873018
 54.62641189 61.74289395 49.09669053 50.85924675 70.15034461 41.93771844
 48.72307761 49.26773134 63.50877896 63.12805277 56.45389376 60.1409223
 45.81620509 64.13384228 45.10575015 57.5095312  45.66746652 61.16909089
 56.26608962 42.85033856 46.09789685 53.98468952 30.58998789 39.36405962
 58.67831381 78.67421742 68.55495654 55.0704614  52.60012306 37.63721457
 71.0207543  44.32605482 56.60938092 25.84387598 53.2508036  40.78062619
 64.76763536 39.56214849 50.09036958 39.57314883 52.37263924 49.56666753
 37.64776052 54.0292777  41.78508542 60.7683851  45.47224944 56.60087722
 30.04337906 37.70717237 43.61402984 58.71157549 36.53530844 56.14070341
 48.5178436  26.33137765 53.064869   57.42704281 69.88141036 37.42105113
 49.571482   62.49978257 41.82770121 51.99811477 52.68746446 68.2275939
 66.27690807 38.25367906 37.70432779 33.61343438 61.98907265 58.49620023
 54.66462591 55.99391436 59.73823198 61.80593537 56.75157738 53.44822996
 67.60863781 34.36274026 60.09806236 57.69858649 54.26640744 52.12764028
 48.88569877 48.45100788 28.65500928 62.17994039 51.22288918 47.06206832
 56.00217309 47.92471579 60.48252174 51.34942825 47.53720027 48.20167066
 45.74429898 45.15151873 45.60251218 48.35390217 54.65392456 36.35837714
 59.47836705 63.9289862  54.06059053 56.21956586 54.8744229  45.47299815
 50.65641433 56.29517728 71.55494667 35.85899844 58.50067322 39.97882937
 61.55906006 64.71027862 55.17090397 36.99902809 42.53585858 36.71888243
 45.28987641 44.90101285 46.47951069 47.14016953 57.06912335 40.93483172
 61.75355186 52.14479545 43.35944473 57.3719678  80.4671838  44.0885481
 52.60968177 56.26706134 67.90672521 45.85426921 39.20789988 53.09135038
 60.8442408  56.75929878 44.67995356 30.94614755 41.61155278 51.15634205
 62.29844112 42.44369008 42.04526607 63.19260813 56.00134299 44.42712704
 44.03554724 42.14117403 46.09224851 44.90139366 56.10333955 52.43008965
 39.93974675 33.86734678 47.13259771 40.9797919  55.20426557 63.07301395
 75.43053293 34.64212875 52.6674399  51.80778463 34.94050734 71.35951745
 48.07155908 44.9818667  27.56352705 42.97401754 65.0744588  61.10336331
 42.14753298 53.13054279 44.28573645 51.95610201 49.60279374 60.40512254
 57.26598533 36.85380582 56.16422214 41.58554363 47.41436028 54.5281129
 60.5228076  42.17761257 41.51684936 44.51646853 47.828988   49.35521355
 40.46927314 36.46284205 30.20228751 55.81986289 52.18030238 47.90654409
 57.00444703 39.68413831 42.02616983 54.77297715 41.7325526  45.4392458
 76.47572443 54.390671   55.165547   53.44637988]

Our population is a normal(Gaussian) distribution with a mean of 50 and standard deviation of 10. Let's plot it...

from matplotlib import pyplot as plt  # For plotting
   
plt.hist(population,50)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.text(15, 45, r'$\mu=50, \sigma^2=10$') 
plt.title("histogram") 
plt.show()

Now, we will draw a random sample from our normal distribution.

# Draw 100 random samples from population
sample = np.random.choice(population, 100)
# calculate mean of sample
mean_sample = np.mean(sample)
print("sample mean = ", mean_sample)
sample mean =  49.39970427108535

We can see that the mean of the sample (49.81) is near to the actual mean of population. This was expected as we have less data in the sample. Let's now try to estimate the variance and standard deviation of the population from the sample data. Please note that one can only estimate and not calculate the variance and standard deviation of the population. This is because most of the times, we don't have the original population but only a small sample. The formula to estimate the variance and standard deviation of the population is as follows:

$$\sigma^2_{estimated} = \frac{\Sigma (x - \bar{x})}{n-1} $$

Where, $\bar{x}$ is the mean of the sample, x is the individual value in the sample, and n is total number of measurements in the sample (100 in our case). Please note that in the equation, we use $n-1$ and not $n$. This is because we will underestimate our variance if we divide by $n$. For a very detailed proof, please refer to this video. Let's see if this is true...

# estimate variance and standard deviation of the sample

est_variance = np.var(sample, ddof = 1) # ddof = 1 means we divide by total number of measurements - 1
std_sample = np.sqrt(est_variance)
print("Estimated variance = ", est_variance, "and estimated standard deviation = ", std_sample)
Estimated variance =  116.17893624576945 and estimated standard deviation =  10.778633319942257

Let's now estimate incorrect variance and standard deviation

# incorrect variance and standard deviation

incorrect_variance = np.var(sample)
incorrect_std_sample = np.std(sample)
print(f'Incorrect variance is {incorrect_variance} and biased standard deviation is {incorrect_std_sample}')
Incorrect variance is 115.01714688331177 and biased standard deviation is 10.724604742521366

As one can see, we underestimated both variance and standard deviation if we use incorrect formula.