September 3, 2014

How to fit data that looks like a gaussian?

I am quite new to statistics, so please forgive me for using probably the wrong vocabulary.
I have some data that looks (to me) like a gaussian when plotted.
The data is an extract from a jpeg image. It's a vertical line taken from the image, and only the Red data is used (from RGB).
Here is the full data (27 data points):
> r
 [1] 0.003921569 0.031372549 0.023529412 0.015686275 0.003921569 0.027450980
 [7] 0.003921569 0.015686275 0.031372549 0.105882353 0.305882353 0.490196078
[13] 0.560784314 0.615686275 0.592156863 0.505882353 0.364705882 0.227450980
[19] 0.050980392 0.031372549 0.019607843 0.054901961 0.031372549 0.015686275
[25] 0.027450980 0.003921569 0.011764706

> dput(r)
c(0.00392156862745098, 0.0313725490196078, 0.0235294117647059, 
0.0156862745098039, 0.00392156862745098, 0.0274509803921569, 
0.00392156862745098, 0.0156862745098039, 0.0313725490196078, 
0.105882352941176, 0.305882352941176, 0.490196078431373, 0.56078431372549, 
0.615686274509804, 0.592156862745098, 0.505882352941176, 0.364705882352941, 
0.227450980392157, 0.0509803921568627, 0.0313725490196078, 0.0196078431372549, 
0.0549019607843137, 0.0313725490196078, 0.0156862745098039, 0.0274509803921569, 
0.00392156862745098, 0.0117647058823529)
plot(r)
 
-------------------
 
Fitting a distribution is, roughly speaking, what you'd do if you made a histogram of your data, and tried to see what sort of shape it had. What you're doing, instead, is simply plotting a curve. That curve happens to have a hump in the middle, like what you get by plotting a gaussian density function.
To get what you want, you can use something like optim to fit the curve to your data. The following code will use nonlinear least-squares to find the three parameters giving the best-fitting gaussian curve: m is the gaussian mean, s is the standard deviation, and k is an arbitrary scaling parameter (since the gaussian density is constrained to integrate to 1, whereas your data isn't).
x <- seq_along(r)

f <- function(par)
{
    m <- par[1]
    sd <- par[2]
    k <- par[3]
    rhat <- k * exp(-0.5 * ((x - m)/sd)^2)
    sum((r - rhat)^2)
}

optim(c(15, 2, 1), f, method="BFGS", control=list(reltol=1e-9))
 

I propose to use non-linear least squares for this analysis.
# First present the data in a data-frame
tab <- data.frame(x=seq_along(r), r=r)
#Apply function nls
(res <- nls( r ~ k*exp(-1/2*(x-mu)^2/sigma^2), start=c(mu=15,sigma=5,k=1) , data = tab))
And from the output, I was able to obtain the following fitted "Gaussian curve":
v <- summary(res)$parameters[,"Estimate"]
plot(r~x, data=tab)
plot(function(x) v[3]*exp(-1/2*(x-v[1])^2/v[2]^2),col=2,add=T,xlim=range(tab$x) )
©