Software Requirements

This course requires a variety of software packages. To ensure students have as similar results to others, this course will use specific versions of those software. The software, version and their purpose are provided below.

Software

Version

Purpose

R

4.1.3

A free software environment for statistical computing and graphics.

FastQC

0.11.9

A quality control tool for high throughput sequence data.

MultiQC

1.13a

MultiQC summarizes the output from numerous bioinformatics tools into a single report.

SRA Tools

2.11.0

A collection of tools and libraries for using data in the INSDC Sequence Read Archives.

FastP

0.23.2

Provides fast all-in-one preprocessing for FastQ file.

SamTools

1.15.1

A suite of programs for interacting with high-throughput sequencing data.

NOVOPlasty

4.3.1

A de novo assembler and heteroplasmy/variance caller for short circular genomes.

BWA

0.7.17

A software package for mapping low-divergent sequences against a large reference genome.

Jellyfish

2.2.10

A tool for fast, memory-efficient counting of k-mers in DNA.

Hifiasm

0.16.1

Hifiasm is a fast haplotype-resolved de novo assembler for PacBio HiFi reads.

Mummer

3.23

Ultra-fast alignment of large-scale DNA and protein sequences.

BedTools

2.30.0

A swiss-army knife of tools for a wide-range of genomics analysis tasks.

RepeatMasker

4.1.2.p1

Screens DNA sequences for interspersed repeats and low complexity DNA sequences

Braker2

2.1.6

Gene structural annotation.

BUSCO

4.1.2

Assesses genome assembly and annotation completeness using single-copy orthologs.

EDTA

2.0.1

Automated whole-genome de-novo TE annotation and benchmarking.

hic_qc

git commit 6881c33

Performs QC Checks for Hi-C libraries using reads in a BAM file aligned to the genome assembly. The version of this software is a commit to the source repository made on June 27, 2022.