Exploring PacBio data for Beginners -- Quality Control

June 22, 2016
Blog Post

Iowa State University now has a PacBio machine and there are going to be lots of questions about how to analyze the data. So I thought this would be a good first post for the blog.

Everything that I am documenting here comes from the github repository located here and by reading the power point documentation provided.

I am loading an older version of R since many of the required packages are not available yet using install.packages.

These libraries are required prerequesites to run stsPlots.R for QC of PacBio runs.

  1. module load 3.1.3
  2. R
  3. install.packages(ggplot2)
  4. install.packages(reshape2)
  5. install.packages(plyr)

Change to the directory that you plan on doing the QC

Now that we have it installed let's grab the stsPlots.R functions from github

  1. wget https://github.com/PacificBiosciences/stsPlots/raw/master/stsPlots.R
  3. #Also create softlinks to all sts files, I take advantage of the xargs command to make this really easy.
  4. find ../../protein/ -name "*sts.csv" | xargs -I xx ln -s xx

Now that we have all the files we need let's look at the QC.

  1. library(ggplot2)
  2. library(reshape2)
  3. library(plyr)
  4. source("stsPlots.R")

note that folder names require a double slash for every slash if you are in the folder when you start the R from command line you can execute it as follows.

  1. runStsPlots(c("-folder",".//"))
  2. q()

All plots can now be found in this folder For interpretation, I highly recommend exploring the powerpoint. - https://github.com/PacificBiosciences/stsPlots/blob/master/stsPlots_Usage.pptx

  1. ./Analysis_Results
Blog type: