1.3 Lab Exercises

Overview

In this lab, we will learn how to organize our project, then write and run a loop in bash.

We will do four major things in this lab:

  • Organize our data and the project

  • Make symbolic links to raw data

  • Write a bash loop

  • Run fastqc in a bash loop

Give it a go, be patient, and ask questions.

“Some people feel the rain. Others just get wet.” Bob Dylan

Task A: Organize your directories and clean up

Step 1. Project organization

First, let’s talk a little about data management and organization. Do you have a lot of files in your home directory? Is it cluttered? Clean up! Delete things you don’t need anymore (using the rm command, carefully. Remember: Once it’s gone, it’s gone for good).

Today we’re going to start our first analysis on the Toomer’s Oak data by touching the raw Illumina data, so let’s go over some best practices in project management. Keep everything organized! Create a new directory called “toomers-genome” in your home directory, as well as four directories within, to represent the 4 major data types we’ll be generating.

# This quickly gets you to your home directory
cd ~

mkdir toomers-genome
cd toomers-genome

# We can make multiple directories at once with the mkdir
mkdir shotgun-dna rna-seq pacbio hi-c

Note

The pound sign / hashtag is the universal symbol for leaving a comment in a piece of code. Interpreted languages (like bash, perl, python) ignore any line that starts with #. Annotate your scripts, and leave yourself notes, using this symbol! treat your code like your lab notebook.

Change directory into shotgun-dna. Then make two more directories called raw-data and fastqc. Your directory structure should look like this:

Project directory structure

I used the tree command to make this directory tree.

If you’re struggling to make directories and move around, or if this tree-like structure of directories doesn’t make sense, be sure to review modules 1 and 2 of Linux Survival.