Skip to content
Snippets Groups Projects
Select Git revision
  • ee7fc3230d1204362ac624185aeda87f84f97645
  • master default protected
  • v0.2.0
  • v0.1.0
4 results

Zalpha_rsq_over_expected.R

Blame
  • Zalpha_rsq_over_expected.R 9.50 KiB
    
    
    #' Runs the Zalpha function on the r-squared values over the expected r-squared values for the region
    #'
    #' Returns a \eqn{Z_{\alpha}^{r^2/E[r^2]}}{Zalpha} value for each SNP location supplied to the function, based on
    #' the expected \eqn{r^2} values given an LD profile and genetic distances.
    #' For more information about the \eqn{Z_{\alpha}^{r^2/E[r^2]}}{Zalpha} statistic, please see Jacobs (2016).
    #' The \eqn{Z_{\alpha}^{r^2/E[r^2]}}{Zalpha} statistic is defined as:
    #' \deqn{{Z_{\alpha}^{r^2/E[r^2]}}=\frac{{|L| \choose 2}^{-1}\sum_{i,j \in L}r^2_{i,j}/E[r^2_{i,j}] + {|R| \choose 2}^{-1}\sum_{i,j \in R}r^2_{i,j}/E[r^2_{i,j}]}{2}}
    #' where \code{|L|} and \code{|R|} are the number of SNPs to the left and right of the current locus within the given window \code{ws}, \eqn{r^2}{r^2} is equal to
    #' the squared correlation between a pair of SNPs, and \eqn{E[r^2]}{E[r^2]} is equal to the expected squared correlation between a pair of SNPs, given an LD profile.
    #'
    #' The LD profile describes the expected correlation between SNPs at a given genetic distance, generated using simulations or
    #' real data. Care should be taken to utilise an LD profile that is representative of the population in question. The LD
    #' profile should consist of evenly sized bins of distances (for example 0.0001 cM per bin), where the value given is the (inclusive) lower
    #' bound of the bin. Ideally, an LD profile would be generated using data from a null population with no selection, however one can be generated
    #' using this data. See the \code{\link{create_LDprofile}} function for more information on how to create an LD profile.
    #'
    #' @importFrom stats cor na.omit
    #'
    #' @param pos A numeric vector of SNP locations
    #' @param ws The window size which the \eqn{Z_{\alpha}^{r^2/E[r^2]}}{Zalpha} statistic will be calculated over. This should be on the same scale as the \code{pos} vector.
    #' @param x A matrix of SNP values. Columns represent chromosomes; rows are SNP locations. Hence, the number of rows should equal the length of the \code{pos} vector. SNPs should all be biallelic.
    #' @param dist A numeric vector of genetic distances (e.g. cM, LDU). This should be the same length as \code{pos}.
    #' @param LDprofile_bins A numeric vector containing the lower bound of the bins used in the LD profile. These should be of equal size.
    #' @param LDprofile_rsq A numeric vector containing the expected \eqn{r^2}{r^2} values for the corresponding bin in the LD profile. Must be between 0 and 1.
    #' @param minRandL Minimum number of SNPs in each set R and L for the statistic to be calculated. Default is 4.
    #' @param minRL Minimum value for the product of the set sizes for R and L. Default is 25.
    #' @param X Optional. Specify a region of the chromosome to calculate \eqn{Z_{\alpha}^{r^2/E[r^2]}}{Zalpha} for in the format \code{c(startposition, endposition)}. The start position and the end position should be within the extremes of the positions given in the \code{pos} vector. If not supplied, the function will calculate \eqn{Z_{\alpha}^{r^2/E[r^2]}}{Zalpha} for every SNP in the \code{pos} vector.
    #'
    #' @return A list containing the SNP positions and the \eqn{Z_{\alpha}^{r^2/E[r^2]}}{Zalpha} values for those SNPs
    #' @references Jacobs, G.S., T.J. Sluckin, and T. Kivisild, \emph{Refining the Use of Linkage Disequilibrium as a Robust Signature of Selective Sweeps.} Genetics, 2016. \strong{203}(4): p. 1807
    #' @examples
    #' ## load the snps and LDprofile example datasets
    #' data(snps)
    #' data(LDprofile)
    #' ## run Zalpha_rsq_over_expected over all the SNPs with a window size of 3000 bp
    #' Zalpha_rsq_over_expected(snps$bp_positions,3000,as.matrix(snps[,3:12]),snps$cM_distances,
    #'  LDprofile$bin,LDprofile$rsq)
    #' ## only return results for SNPs between locations 600 and 1500 bp
    #' Zalpha_rsq_over_expected(snps$bp_positions,3000,as.matrix(snps[,3:12]),snps$cM_distances,
    #'  LDprofile$bin,LDprofile$rsq,X=c(600,1500))
    #'
    #' @export
    #' @seealso \code{\link{create_LDprofile}}
    Zalpha_rsq_over_expected<-function(pos, ws, x, dist, LDprofile_bins, LDprofile_rsq, minRandL = 4, minRL = 25, X = NULL){
    
      #Check things are in the correct format
    
      #Check pos is a numeric vector
      if (is.numeric(pos) ==FALSE || is.vector(pos)==FALSE){
        stop("pos must be a numeric vector")
      }
      #Check x is a matrix
      if (is.matrix(x)==FALSE){
        stop("x must be a matrix")
      }
      #Check x has rows equal to the length of pos
      if (length(pos) != nrow(x)){
        stop("The number of rows in x must equal the number of SNP locations given in pos")
      }
      #Check SNPs are all biallelic
      if(sum(apply(x,1,function(x){length(na.omit(unique(x)))}) != 2)>0){
        stop("SNPs must all be biallelic")
      }
      #Check dist is a numeric vector
      if (is.numeric(dist) ==FALSE || is.vector(dist)==FALSE){
        stop("dist must be a numeric vector")
      }
      #Check dist is the same length as pos