Title: | One-Sided Multinomial Probabilities |
---|---|
Description: | Implements multinomial CDF (P(N1<=n1, ..., Nk<=nk)) and tail probabilities (P(N1>n1, ..., Nk>nk)), as well as probabilities with both constraints (P(l1<N1<=u1, ..., lk<Nk<=uk)). Uses a method suggested by Bruce Levin (1981) <doi:10.1214/aos/1176345593>. |
Authors: | Alexander Davis |
Maintainer: | Alexander Davis <[email protected]> |
License: | AGPL-3 |
Version: | 1.0.0 |
Built: | 2024-11-01 03:23:35 UTC |
Source: | https://github.com/cran/pmultinom |
Calculate the sample size such that the probability of a result is a given amount.
invert.pmultinom(lower = -Inf, upper = Inf, probs, target.prob, method)
invert.pmultinom(lower = -Inf, upper = Inf, probs, target.prob, method)
lower |
Vector of lower bounds. Lower bounds are excluded |
upper |
Vector of upper bounds. Upper bounds are included |
probs |
Cell probabilities |
target.prob |
The probability of the event, at the output sample size. |
method |
Method used for computation. Only method currently implemented is "exact" |
If only lower is given, then the result is the smallest size such that pmultinom(lower=lower, size=size, probs=probs) >= target.prob. If only upper is given, then the result is the smallest size such that pmultinom(upper=upper, size=size, probs=probs) <= target.prob. Behavior when both lower and upper are given is not yet implemented.
The sample size parameter at which the the target probability of the given event is achieved.
Casasent, A. K., Schalck, A., Gao, R., Sei, E., Long, A., Pangburn, W., ... & Navin, N. E. (2018). Multiclonal Invasion in Breast Tumors Identified by Topographic Single Cell Sequencing. Cell. doi:10.1016/j.cell.2017.12.007
# How many cells must be sequenced to have a 95% chance of # observing at least 2 from each subclone of a tumor? (Data # from Casasent et al (2018); see vignette("pmultinom") for # details of this example) # Input: ncells <- 204 subclone.freqs <- c(43, 20, 82, 17, 5, 37)/ncells target.number <- c(2, 2, 2, 2, 2, 0) lower.bound <- target.number - 1 invert.pmultinom(lower=lower.bound, probs=subclone.freqs, target.prob=.95, method="exact") # Output: # [1] 192
# How many cells must be sequenced to have a 95% chance of # observing at least 2 from each subclone of a tumor? (Data # from Casasent et al (2018); see vignette("pmultinom") for # details of this example) # Input: ncells <- 204 subclone.freqs <- c(43, 20, 82, 17, 5, 37)/ncells target.number <- c(2, 2, 2, 2, 2, 0) lower.bound <- target.number - 1 invert.pmultinom(lower=lower.bound, probs=subclone.freqs, target.prob=.95, method="exact") # Output: # [1] 192
Calculate the probability that a multnomial random vector is between, elementwise, two other vectors.
pmultinom(lower = -Inf, upper = Inf, size, probs, method)
pmultinom(lower = -Inf, upper = Inf, size, probs, method)
lower |
Vector of lower bounds. Lower bounds are excluded |
upper |
Vector of upper bounds. Upper bounds are included |
size |
Number of draws |
probs |
Cell probabilities |
method |
Method used for computation. Only method currently implemented is "exact" |
The calculation follows the scheme suggested in Levin (1981): begin with the equivalent probability for a Poisson random vector, and update it by conditioning on the sum of the vector being equal to the size parameter, using Bayes' theorem. This requires computation of the distribution of a sum of truncated Poisson random variables, which is accomplished using convolution, as per Levin's suggestion for an exact calculation. Levin's suggestion for an approximate calculation, using Edgeworth expansions, may be added to a later version. Fast convolution is achieved using the fastest Fourier transform in the west (Frigo, Johnson 1998).
The probability that a multinomial random vector is greater than all the lower bounds, and less than or equal all the upper bounds:
P(l1 < N1 <= u1, ..., lk < Nk <= uk)
If only the upper bounds are given, then this is the multinomial CDF:
P(N1<=u1, ..., Nk<=uk)
If only the upper bounds are given, then this is the multinomial tail probability:
P(N1>l1, ..., Nk>lk)
Fougere, P. F. (1988). Maximum entropy calculations on a discrete probability space. In Maximum-Entropy and Bayesian Methods in Science and Engineering (pp. 205-234). Springer, Dordrecht. doi:10.1007/978-94-009-3049-0_10
Frigo, Matteo, and Steven G. Johnson. (2005). The design and implementation of FFTW3. Proceedings of the IEEE, 93(2), 216-231. doi:10.1109/JPROC.2004.840301
Levin, Bruce. (1981). "A Representation for Multinomial Cumulative Distribution Functions". Annals of Statistics 9 (5): 1123–6. doi:10.1214/aos/1176345593
# To determine the bias of a die, Rudolph Wolf rolled it # 20,000 times. Side 2 was the most frequently observed, and # was observed 3631 times. What is the probability that a # fair die would have a side observed this many times or # more? # Input: 1 - pmultinom(upper=rep.int(3630, 6), size=20000, probs=rep.int(1/6, 6), method="exact") # Output: # [1] 7.379909e-08 # Therefore we conclude that the die is biased. Fougere # (1988) attempted to account for these biases by assuming # certain manufacturing errors. Repeating the calculation # with the distribution Fougere derived: # Input: theoretical.dist <- c(.17649, .17542, .15276, .15184, .17227, .17122) 1 - pmultinom(upper=rep.int(3630, 6), size=20000, probs=theoretical.dist, method="exact") # Output: # [1] 0.043362 # Therefore we conclude that the die still seems more biased # than Fougere's model can explain.
# To determine the bias of a die, Rudolph Wolf rolled it # 20,000 times. Side 2 was the most frequently observed, and # was observed 3631 times. What is the probability that a # fair die would have a side observed this many times or # more? # Input: 1 - pmultinom(upper=rep.int(3630, 6), size=20000, probs=rep.int(1/6, 6), method="exact") # Output: # [1] 7.379909e-08 # Therefore we conclude that the die is biased. Fougere # (1988) attempted to account for these biases by assuming # certain manufacturing errors. Repeating the calculation # with the distribution Fougere derived: # Input: theoretical.dist <- c(.17649, .17542, .15276, .15184, .17227, .17122) 1 - pmultinom(upper=rep.int(3630, 6), size=20000, probs=theoretical.dist, method="exact") # Output: # [1] 0.043362 # Therefore we conclude that the die still seems more biased # than Fougere's model can explain.