Added documentation about max_dist

bbabd7a7 · Clare · eddb955b · bbabd7a7 · bbabd7a7 · bbabd7a7
Commit bbabd7a7 authored 4 years ago by Clare
--- a/R/create_LDprofile.R
+++ b/R/create_LDprofile.R
@@ -7,7 +7,9 @@
 #' Both lists should be the same length and should correspond exactly to each other (i.e. the distances in each element of \code{dist} should go with the SNPs in the same element of x)
 #'
 #' In the output, bins represent lower bounds. The first bin contains pairs where the genetic distance is greater than or equal to 0 and less than \code{bin_size}. The final bin contains pairs where the genetic distance is greater than or equal to \code{max_dist}-\code{bin_size} and less than \code{max_dist}.
-#' If the \code{max_dist} is not an increment of \code{bin_size}, it will be adjusted to the next highest increment. The final bin will be the bin that \code{max_dist} falls into. For example, if the \code{max_dist} is given as 4.5 and the \code{bin_size} is 1, the final bin will be 4.\cr
+#' If the \code{max_dist} is not an increment of \code{bin_size}, it will be adjusted to the next highest increment. The final bin will be the bin that \code{max_dist} falls into. For example, if the \code{max_dist} is given as 4.5 and the \code{bin_size} is 1, the final bin will be 4.
+#' \code{max_dist} should be big enough to cover the genetic distances between pairs of SNPs within the window size given when the \eqn{Z_{\alpha}}{Zalpha} statistics are run. Any pairs with genetic distances bigger than \code{max_dist} will be assigned the values in the maximum bin of the LD profile.\cr
+#'
 #' By default, Beta parameters are not calculated. To fit a Beta distribution to the expected correlations, needed for the \code{\link{Zalpha_BetaCDF}} and \code{\link{Zbeta_BetaCDF}} statistics, \code{beta_params} should be set to TRUE and the package 'fitdistrplus' must be installed.
 #'
 #' Ideally, an LD profile would be generated using data from a null population with no selection, For example by using a simulation if the other population parameters are known. However, often these are unknown or complex, so generating an LD profile using the same data as is being analysed is acceptable, as long as the bins are large enough.

--- a/man/create_LDprofile.Rd
+++ b/man/create_LDprofile.Rd
@@ -28,7 +28,9 @@ The input for \code{dist} and \code{x} can be lists. This allows multiple datase
 Both lists should be the same length and should correspond exactly to each other (i.e. the distances in each element of \code{dist} should go with the SNPs in the same element of x)
 In the output, bins represent lower bounds. The first bin contains pairs where the genetic distance is greater than or equal to 0 and less than \code{bin_size}. The final bin contains pairs where the genetic distance is greater than or equal to \code{max_dist}-\code{bin_size} and less than \code{max_dist}.
-If the \code{max_dist} is not an increment of \code{bin_size}, it will be adjusted to the next highest increment. The final bin will be the bin that \code{max_dist} falls into. For example, if the \code{max_dist} is given as 4.5 and the \code{bin_size} is 1, the final bin will be 4.\cr
+If the \code{max_dist} is not an increment of \code{bin_size}, it will be adjusted to the next highest increment. The final bin will be the bin that \code{max_dist} falls into. For example, if the \code{max_dist} is given as 4.5 and the \code{bin_size} is 1, the final bin will be 4.
+\code{max_dist} should be big enough to cover the genetic distances between pairs of SNPs within the window size given when the \eqn{Z_{\alpha}}{Zalpha} statistics are run. Any pairs with genetic distances bigger than \code{max_dist} will be assigned the values in the maximum bin of the LD profile.\cr
 By default, Beta parameters are not calculated. To fit a Beta distribution to the expected correlations, needed for the \code{\link{Zalpha_BetaCDF}} and \code{\link{Zbeta_BetaCDF}} statistics, \code{beta_params} should be set to TRUE and the package 'fitdistrplus' must be installed.
 Ideally, an LD profile would be generated using data from a null population with no selection, For example by using a simulation if the other population parameters are known. However, often these are unknown or complex, so generating an LD profile using the same data as is being analysed is acceptable, as long as the bins are large enough.

--- a/vignettes/zalpha.Rmd
+++ b/vignettes/zalpha.Rmd
@@ -239,7 +239,7 @@ This code has created an LD profile with 6 columns. These are:
 * __n__ This is the number of pairs of SNPs with a genetic distance falling within this bin, whose correlations were used to calculate the statistics. 
-There is one more optional input parameter - max_dist - which sets the maximum distance SNPs can be apart for calculating for the LD profile. For real world data, Jacobs et al. (2016)[1] recommend using distances up to 2 cM assigned to bins of size 0.0001 cM.
+There is one more optional input parameter - max_dist - which sets the maximum distance SNPs can be apart for calculating for the LD profile. For real world data, Jacobs et al. (2016)[1] recommend using distances up to 2 cM assigned to bins of size 0.0001 cM. Without this parameter, the code will generate bins up to the maximum distance between pairs of SNPs, which is likely to be inefficient as most distances will not be used. max_dist should be big enough to cover the genetic distances between pairs of SNPs within the window size given when the $Z_{\alpha}$ statistics are run. Any pairs with genetic distances bigger than max_dist will be assigned the values in the maximum bin of the LD profile.
 Ideally, we would want to generate an LD profile based on genetic data without selection but exactly matching the other population parameters for our data. This could be done using simulated data (using software such as MSMS[6] or SLiM[7]). We could use another genetic dataset containing a similar population. Alternatively, we could generate an LD profile using the same dataset that we are analysing for selection. Care should be taken that bins are big enough to have a lot of data in so expected r^2^ values are not overly affected by outliers.