Tutorials

RNASeq analysis walk-through

This wiki will guide you through the RNAseq analysis, starting from the quiality checking till getting the differntial gene expression results. The next part of the wiki series will guide you through some of the down stream analysis that you can do to the results obatined here. Here is the overview of the RNAseq analysis covered in this tutorial.

Overview

rnaseq

Figure 1.: Overview of the RNAseq workflow

How to view files in a remote machine without downloading locally?

Many times the results generated by the bioinformatics programs are either simple text files (tab/comma seperated), pdf files or in some rare cases png/jpeg files. Here we will show you how to view these files without having to download them locally to your machine.

Text files

The text files are the easiest. You can use any of the standard UNIX commands to view them. There are many commands for this purpose such as:

Merge two spreadsheets using a common column in Excel

Excel is most popular among researchers becuase of its ease of use and tons of useful features. In most cases scripting is the most effecient way to do these simple operations, but practicality of Excel for researchers and the crytic scripting commands will always make excel a better choice. Most common case of merging 2 spreadsheets is when users have a list of gene ids and another list of geneids with function. To merge these 2 sheets using the gene-ids, we can use the VLOOKUP function.

Data

Typically, users will have something like this:

Guide for installing various types of programs in Linux

This handy guide is for installing programs in UNIX environment. Most of these steps assume that you are installing package in a group accessible location, without root access and utilizing the environment module systems for package management. However, you can easily modify these steps for other cases as well.

Genomescope

This program uses k-mer frequencies generated from raw read data to estimate the genome size, abundance of repetitive elements and rate of heterozygosity.

Background

A K-mer is a substring of length K in a string of DNA bases.

For example:

All 2-mers of the sequence "AATTGGCCG" are AA, AT, TT, TG, GG,GC, CC, CG

File Transfer Using Globus Connect Personal (GCP)

This tutorial explains how to transfer files to/from GIF server. You will need Global Connect Personal application to facilitate the file transfer. Globus Connect Personal turns your laptop or other personal computer into a Globus endpoint and allows you to transfer and share files easily. Follow the instructions to set up a Globus account and transfer files to/from GIF server.

Generate index sheet linking all spreadsheets in Excel

Before proceeding, check if you have enabled the macros, i.e., if you don't see DEVELOPER tab in you empty spreadsheet, click on FILE, OPTIONS and Customize Ribbon. You should see a check box on the right hand side, for the DEVELOPER tab, check it and click OK.

Click on DEVELOPER and then Macros, type in some name (eg. import_text), click create.

Paste the below code on the popped window:

Export multiple worksheets as separate text files in Excel

If there are large number of worksheets (tabs) in your excel file that you need to export as a separate text file, follow these guidelines. Note that the worksheet label will be used as file name for the text file with the .txt extension.

Import multiple text files as separate worksheets in Excel

If there are large number of text files that you need to import as a separate worksheet, follow these guidelines. Note that the file name of the text file will be used to label the worksheet (tab), without the .txt extension.

Before proceeding, check if you have enabled the macros, i.e., if you don't see DEVELOPER tab in you empty spreadsheet, click on FILE, OPTIONS and Customize Ribbon. You should see a check box on the right hand side, for the DEVELOPER tab, check it and click OK.

Downloading all SRA files related to a BioProject/study

NCBI Sequence Read Archive (SRA) stores sequence and quality data (fastq files) in aligned or unaligned formats from NextGen sequencing platforms. A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. Often times, once single BioProject will hold a considerable number of experiments and it gets tedious to download them all individually.

Pages