Title: | The Davies Quantile Function |
---|---|
Description: | Various utilities for the Davies distribution. |
Authors: | Robin K. S. Hankin |
Maintainer: | Robin K. S. Hankin <[email protected]> |
License: | GPL-2 |
Version: | 1.2-0 |
Built: | 2025-01-03 02:41:19 UTC |
Source: | https://github.com/robinhankin/davies |
Density, distribution function, quantile function and random generation for the Davies distribution.
ddavies(x, params,log=FALSE) pdavies(x, params,log.p=FALSE,lower.tail=TRUE) qdavies(p, params,lower.tail=TRUE) rdavies(n, params) ddavies.p(x,params,log=FALSE)
ddavies(x, params,log=FALSE) pdavies(x, params,log.p=FALSE,lower.tail=TRUE) qdavies(p, params,lower.tail=TRUE) rdavies(n, params) ddavies.p(x,params,log=FALSE)
x |
quantile |
p |
vector of probabilities |
n |
number of observations. If |
lower.tail |
logical; if |
log , log.p
|
logical; if |
params |
A three-member vector holding \(C\), \(\lambda_1\) and \(\lambda_2\) |
The Davies distribution is defined in terms of its quantile function:
Cp^lambda_1/(1-p)^lambda2
It does not have a closed-form probability density function or cumulative density function, so numerical solution is used.
Function ddavies.p()
returns the density of the Davies function
but as a function of the quantile.
Function
ddavies()
gives the density,
pdavies()
gives the distribution function,
qdavies()
gives the quantile function, and
rdavies()
generates random deviates.
Robin K. S. Hankin
R. K. S. Hankin and A. Lee 2006. “A new family of non-negative distributions” Australia and New Zealand Journal of Statistics, 48(1):67–78
Gld
, fit.davies.p
,
least.squares
, skewness
params <- c(10,0.1,0.1) x <- seq(from=4,to=20,by=0.2) p <- seq(from=1e-3,to=1-1e-3,len=50) rdavies(n=5,params) least.squares(rdavies(100,params)) plot(pdavies(x,params)) plot(p,qdavies(p,params)) plot(x,ddavies(x,params),type="b")
params <- c(10,0.1,0.1) x <- seq(from=4,to=20,by=0.2) p <- seq(from=1e-3,to=1-1e-3,len=50) rdavies(n=5,params) least.squares(rdavies(100,params)) plot(pdavies(x,params)) plot(p,qdavies(p,params)) plot(x,ddavies(x,params),type="b")
Moments of order statistics of random variables drawn from a Davies distribution
davies.moment(n=1 , i=1 , order=1 , params) M(order,params) mu(params) expected.value(n,i,params) expected.value.approx(n,i,params) variance(params) skewness(params) kurtosis(params)
davies.moment(n=1 , i=1 , order=1 , params) M(order,params) mu(params) expected.value(n,i,params) expected.value.approx(n,i,params) variance(params) skewness(params) kurtosis(params)
params |
A three-member vector holding |
n |
The number of observations |
i |
Return information about the |
order |
The order (eg order=2 gives the square) |
Function davies.moment(n,i,order=r)
gives the -th moment
of the
-th order statistic of
observations. The
following aliases are just convenience wrappers with
(ie
moments of one observation from a Davies distribution):
M()
gives the -th moment for
mu()
gives the first moment of a Davies distribution
(ie the mean)
variance()
gives the second central moment of a Davies
distribution
skewness()
gives the normalized skewness of a Davies
distribution
kurtosis()
gives the normalized kurtosis of a Davies
distribution
Robin K. S. Hankin
params <- c(10,0.1,0.1) davies.moment(n=100,i=99,2,params) # ie the second moment of the 99th smallest # observation of 100 drawn from a Davies # distribution with parameters p mean(rdavies(1e6,params))-mu(params) #now reproduce the S-K graph: f <- function(x,y){c(skewness(c(1,x,y)),kurtosis(c(1,x,y)))} g <- function(j,vector,pp,qq=1){points(t(sapply(vector,f,y=j)),type="l",col="black",lty=qq)} vector <- c((0:300)/100 , (0:300)/10000 , seq(from=3,to=10,len=100)) vector <- sort(unique(vector)) plot(t(sapply((0:10)/10,f,y=0)), xlim=c(-3,3),ylim=c(0,10), type="n",xlab="skewness",ylab="kurtosis") g(0.001,vector,"red",qq=1) g(0.01,vector,"yellow",qq=2) g(0.02,vector,"green",qq=3) g(0.05,vector,"blue",qq=4) g(0.1 ,vector,"purple",qq=5) g(0.14,vector,"black",qq=6) x <- seq(from=-3,to=3,len=30) points(x,x^2+1,type="l",lwd=2) leg.txt <- expression(lambda[2]==0.001, lambda[2]==0.01,lambda[2]==0.02,lambda[2]==0.05, lambda[2]==0.1,lambda[2]==0.14) legend(-1.1,10,leg.txt,col="black",lty=1:6)
params <- c(10,0.1,0.1) davies.moment(n=100,i=99,2,params) # ie the second moment of the 99th smallest # observation of 100 drawn from a Davies # distribution with parameters p mean(rdavies(1e6,params))-mu(params) #now reproduce the S-K graph: f <- function(x,y){c(skewness(c(1,x,y)),kurtosis(c(1,x,y)))} g <- function(j,vector,pp,qq=1){points(t(sapply(vector,f,y=j)),type="l",col="black",lty=qq)} vector <- c((0:300)/100 , (0:300)/10000 , seq(from=3,to=10,len=100)) vector <- sort(unique(vector)) plot(t(sapply((0:10)/10,f,y=0)), xlim=c(-3,3),ylim=c(0,10), type="n",xlab="skewness",ylab="kurtosis") g(0.001,vector,"red",qq=1) g(0.01,vector,"yellow",qq=2) g(0.02,vector,"green",qq=3) g(0.05,vector,"blue",qq=4) g(0.1 ,vector,"purple",qq=5) g(0.14,vector,"black",qq=6) x <- seq(from=-3,to=3,len=30) points(x,x^2+1,type="l",lwd=2) leg.txt <- expression(lambda[2]==0.001, lambda[2]==0.01,lambda[2]==0.02,lambda[2]==0.05, lambda[2]==0.1,lambda[2]==0.14) legend(-1.1,10,leg.txt,col="black",lty=1:6)
Gives a “start” value for the optimization routines. Uses heuristics that seem to work.
davies.start(x, threeps=c(0.1,0.5,0.9), small = 0.01)
davies.start(x, threeps=c(0.1,0.5,0.9), small = 0.01)
x |
dataset to be used |
threeps |
a three-element vector representing the quantiles to be balanced. The default values balance the first and ninth deciles and the median. These seem to work for me pretty well; YMMV |
small |
a “small” value to be used for |
Returns a “start” value of the pararameters for use in one of the
Davies fitting routines maximum.likelihood()
or least.squares()
.
Uses three heuristic methods (one assuming , one with
,
and one with
). Returns the best one of the
three, as measured by
objective()
.
Robin K. S. Hankin
least.squares
, maximum.likelihood
,
objective
d <- rchisq(40,1) davies.start(d) least.squares(d) params <- c(10 , 0.1 , -0.1) x <- rdavies(100 , params) davies.start(x) f <- function(threeps){objective(davies.start(x,threeps),x)} (jj<-optim(c(0.1,0.5,0.9),f)) davies.start(x,jj$par) least.squares(x) #not bad at all.
d <- rchisq(40,1) davies.start(d) least.squares(d) params <- c(10 , 0.1 , -0.1) x <- rdavies(100 , params) davies.start(x) f <- function(threeps){objective(davies.start(x,threeps),x)} (jj<-optim(c(0.1,0.5,0.9),f)) davies.start(x,jj$par) least.squares(x) #not bad at all.
Returns the expected value of the Generalized Lambda Distribution
expected.gld(n=1, i=1, params) expected.gld.approx(n=1, i=1, params)
expected.gld(n=1, i=1, params) expected.gld.approx(n=1, i=1, params)
n |
Number of observations |
i |
Order statistic: |
params |
The four parameters of a GLD distribution |
expected.gld
and expected.approx
return the exact and
approximate values of the expected value of a Generalized Lambda
Distribution RV.
Exploits the fact that the gld
quantile function is the sum of
a constant and two davies
quantile functions
Robin K. S. Hankin
A. Ozturk and R. F. Dale, “Least squares estimation of the parameters of the generalized lambda distribution”, Technometrics 1985, 27(1):84 [it does not appear to be possible, as of R-2.9.1, to render the diacritic marks in the first author's names in a nicely portable way]
params <- c(4.114,0.1333,0.0193,0.1588) mean(rgld(1000,params)) expected.gld(n=1,i=1,params) expected.gld.approx(n=1,i=1,params) f <- function(n){apply(matrix(rgld(n+n,params),2,n),2,min)} #ie f(n) gives the smaller of 2 rgld RVs, n times. mean(f(1000)) expected.gld(n=2,i=1,params) expected.gld.approx(n=2,i=1,params) plot(1:100,expected.gld.approx(n=100,i=1:100,params)-expected.gld(n=100,i=1:100,params)) # not bad, eh? ....yyyeeeeesss, but the parameters given by Ozturk give # an almost zero second derivative for d(qgld)/dp, so the good agreement # isn't surprising really. Observe that the error is minimized at about # p=0.2, where the point of inflection is.
params <- c(4.114,0.1333,0.0193,0.1588) mean(rgld(1000,params)) expected.gld(n=1,i=1,params) expected.gld.approx(n=1,i=1,params) f <- function(n){apply(matrix(rgld(n+n,params),2,n),2,min)} #ie f(n) gives the smaller of 2 rgld RVs, n times. mean(f(1000)) expected.gld(n=2,i=1,params) expected.gld.approx(n=2,i=1,params) plot(1:100,expected.gld.approx(n=100,i=1:100,params)-expected.gld(n=100,i=1:100,params)) # not bad, eh? ....yyyeeeeesss, but the parameters given by Ozturk give # an almost zero second derivative for d(qgld)/dp, so the good agreement # isn't surprising really. Observe that the error is minimized at about # p=0.2, where the point of inflection is.
A convenience wrapper (and pretty-printer) for
maximum.likelihood()
and least.squares()
. Given a
dataset, it draws an empirical quantile function
(fit.davies.p()
) or PDF (fit.davies.q()
) and
superimposes the dataset.
fit.davies.p(x , print.fit=FALSE, use.q=TRUE , params=NULL, small=1e-5 , ...) fit.davies.q(x , print.fit=FALSE, use.q=TRUE , params=NULL, ...)
fit.davies.p(x , print.fit=FALSE, use.q=TRUE , params=NULL, small=1e-5 , ...) fit.davies.q(x , print.fit=FALSE, use.q=TRUE , params=NULL, ...)
x |
dataset to be fitted and plotted |
print.fit |
Boolean with |
use.q |
Boolean with |
params |
three-element vector holding the three parameters of the
davies dataset. If |
small |
small positive number showing range of quantiles to plot |
... |
Additional parameters passed to |
If print.fit
is TRUE
, return the optimal parameters
Robin K. S. Hankin
least.squares
, maximum.likelihood
fit.davies.q(rchisq(100,1)) fit.davies.p(exp(rnorm(100))) data(x00m700p4) fit.davies.q(x00m700p4)
fit.davies.q(rchisq(100,1)) fit.davies.p(exp(rnorm(100))) data(x00m700p4) fit.davies.q(x00m700p4)
Density, distribution function, quantile function and random generation for the Generalized Lambda Distribution
dgld(x, params) dgld.p(x, params) pgld(q, params) qgld(p, params) rgld(n, params)
dgld(x, params) dgld.p(x, params) pgld(q, params) qgld(p, params) rgld(n, params)
x , q
|
vector of quantiles |
p |
vector of probabilities |
n |
In function |
params |
vector of parameters: |
The Generalized Lambda distribution has quantile function
\[f(x)=\lambda_1 +(p^{\lambda_3} - (1-p)^{\lambda_4})/\lambda_2\]Function
dgld()
gives the density,
dgld.p()
gives the density in terms of the quantile,
pgld()
gives the distribution function,
qgld()
gives the quantile function, and
rgld()
generates random deviates.
M. J. Wichura 1988. “Algorithm AS 241: The Percentage Points of the Normal Distribution”. Applied Statistics, 37, 477–484.
A. Ozturk and R. F. Dale 1985. “Least squares estimation of the parameters of the generalized lambda distribution”. Technometrics 27(1):84
params <- c(4.114,0.1333,0.0193,0.1588) #taken straight from some paper gld.rv <- rgld(100,params) hist(gld.rv) fit.davies.q(gld.rv) #remember the Davies distn has 3 DF and the GLD 4...
params <- c(4.114,0.1333,0.0193,0.1588) #taken straight from some paper gld.rv <- rgld(100,params) hist(gld.rv) fit.davies.q(gld.rv) #remember the Davies distn has 3 DF and the GLD 4...
Finds the best-fit Davies distribution using either the least-squares
criterion (least.squares()
) or maximum likelihood
(maximum.likelihood()
)
least.squares(data, do.print = FALSE, start.v = NULL) maximum.likelihood(data, do.print = FALSE, start.v = NULL)
least.squares(data, do.print = FALSE, start.v = NULL) maximum.likelihood(data, do.print = FALSE, start.v = NULL)
data |
dataset to be fitted |
do.print |
Boolean with |
start.v |
A suitable starting vector of parameters
|
Uses optim()
to find the best-fit Davies distribution to a set
of data.
Function least.squares()
does not match that of Hankin
and Lee 2006.
Returns the parameters of
the best-fit Davies distribution to the dataset
data
BUGS:
Function least.squares()
does not use the same methodology of
Hankin and Lee 2006, and its use is discouraged pending implentation.
Quite apart from that, it can be screwed with bad value for
start.v
. Function maximum.likelihod()
can be very slow.
It might be possible to improve this by using some sort of hot-start
for optim()
.
Robin K. S. Hankin
davies.start
, optim
,
objective
, likelihood
p <- c(10 , 0.1 , 0.1) d <- rdavies(10,p) maximum.likelihood(d) # quite slow least.squares(d) # much faster but not recommended
p <- c(10 , 0.1 , 0.1) d <- rdavies(10,p) maximum.likelihood(d) # quite slow least.squares(d) # much faster but not recommended
Likelihood of observing data
, on the hypothesis of
their coming from a Davies distribution of parameters params
.
Function neg.log.likelihood()
gives minus the loglikelihood
likelihood(params, data)
likelihood(params, data)
params |
Parameters of the Davies distribution |
data |
dataset for which the likelihood is computed |
Robin K. S. Hankin
p1 <- c(10, 0.1, 0.1) p2 <- c(10, 0.4, 0.1) d <- rdavies(100,p1) likelihood(p1,d) likelihood(p2,d) #should be smaller. neg.log.likelihood(p1,rstupid(100)) #should be large negative.
p1 <- c(10, 0.1, 0.1) p2 <- c(10, 0.4, 0.1) d <- rdavies(100,p1) likelihood(p1,d) likelihood(p2,d) #should be smaller. neg.log.likelihood(p1,rstupid(100)) #should be large negative.
The “distance” of a dataset from a particular Davies distribution
objective(params, dataset) objective.approx(params, dataset)
objective(params, dataset) objective.approx(params, dataset)
params |
A three-member vector holding |
dataset |
The dataset to be considered |
Used by the fit.davies.p()
and fit.davies.q()
functions
objective
returns the “distance” of a
dataset from a particular
Davies distribution as measured by the sums of the squares of the
differences between observed (dataset
) and
expected (expected.value()
) values.
objective.approx()
uses expected.approx()
rather than
expected()
to calculate expectations, as per equation 6.
Robin K. S. Hankin
params <- c(10, 0.1, 0.1) x <- rdavies(100,params) objective(params,x) objective.approx(params,x) objective(least.squares(x),x) objective(davies.start(x),x)
params <- c(10, 0.1, 0.1) x <- rdavies(100,params) objective(params,x) objective.approx(params,x) objective(least.squares(x),x) objective(davies.start(x),x)
A four-element vector giving the parameters used by Ozturk.
data(x00m700p4)
data(x00m700p4)
A. Ozturk and R. F. Dale 1985. “Least squares estimation
of the parameters of the generalized lambda distribution”.
Technometrics 27(1):84; see discussion under expected.gld.Rd
.
data(ozturk) hist(rgld(100,ozturk))
data(ozturk) hist(rgld(100,ozturk))
Plots sorted p-values showing which ones would have been rejected
plotcf(y, q=0.05)
plotcf(y, q=0.05)
y |
dataset |
q |
p-value of critical region |
Sorts p-values and plots the order statistic. Useful for investigating a statistical test by using it when the null hypothesis is KNOWN to be true, just to check if the probability of rejection really is alpha.
Also can be used when H0 is wrong, showing what beta is.
Robin K. S. Hankin
f.H0.T <- function(n,free=5){t.test(rt(n,df=free))$p.value} f.H0.F <- function(n,free=5){t.test(rf(n,df1=free,df2=free))$p.value} plotcf(sapply(rep(10,100),f.H0.T)) # should reject about 5: thus # probability of a type I error is # about 0.05 (as it should be; this # is an exact test) plotcf(sapply(rep(10,100),f.H0.F)) # should reject about 80: thus # probability of a type II error is # about 0.2 for this H_A.
f.H0.T <- function(n,free=5){t.test(rt(n,df=free))$p.value} f.H0.F <- function(n,free=5){t.test(rf(n,df1=free,df2=free))$p.value} plotcf(sapply(rep(10,100),f.H0.T)) # should reject about 5: thus # probability of a type I error is # about 0.05 (as it should be; this # is an exact test) plotcf(sapply(rep(10,100),f.H0.F)) # should reject about 80: thus # probability of a type II error is # about 0.2 for this H_A.
a contrived PDF that cannot be closely approximated by a Davies distribution
rstupid(n, a = 1, b = 2, c = 3, d = 4)
rstupid(n, a = 1, b = 2, c = 3, d = 4)
n |
Number of observations |
a |
start of first uniform bit |
b |
end of first uniform bit |
c |
start of second uniform bit |
d |
end of second uniform bit |
The stupid
distribution is composed of two separate uniform
distributions: one from to
, and one from
to
. It is specifically designed to be NOT fittable to any Davies
distribution.
You could probably come up with a more stupid distribution if you tried.
Robin K. S. Hankin
stupid <- rstupid(500) fit.davies.q(stupid)
stupid <- rstupid(500) fit.davies.q(stupid)
Plots two lines and shades the bit in between them
twolines.vert(p, y1, y2, ...)
twolines.vert(p, y1, y2, ...)
p |
vector of quantiles |
y1 |
First set of ordinates |
y2 |
Second set of ordinates |
... |
Extra arguments, passed to |
Plots p
against y1
, and p
against y2
, and
shades the bit in between using vertical lines. This is useful for
comparing two order statistics
Robin K. S. Hankin
twolines.vert(1:100,sort(rnorm(100)),sort(rnorm(100))) params <- c(10 , 0.1 , 0.1) twolines.vert(1:100 , sort(rdavies(100,params)) , sort(rdavies(100,params)))
twolines.vert(1:100,sort(rnorm(100)),sort(rnorm(100))) params <- c(10 , 0.1 , 0.1) twolines.vert(1:100 , sort(rdavies(100,params)) , sort(rdavies(100,params)))
This data set gives the peak concentration for 100 independent instantaneous releases of neutral-buoyancy gas in a windtunnel
data(x00m700p4)
data(x00m700p4)
A vector containing 100 observations
D. J. Hall and others 1991. Repeat variability in instantaneously released heavy gas clouds—some wind tunnel model experiments. Technical Report LR 804 (PA), Warren Spring Laboratory, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2BX.
data(x00m700p4) fit.davies.q(x00m700p4)
data(x00m700p4) fit.davies.q(x00m700p4)