A Primer

The V4 hypervariable region of the 16S rRNA (ribosomal RNA) gene encodes for part of the ribosome (more specifically, the 30S small subunit) found in prokaryotic cells. Sections of this gene are conserved across the genomes of all bacterial species, and variations within the coding sequence are used to reconstruct phylogeny. Hence, highly complex bacterial communities are commonly identified using 16S rRNA gene sequencing.

Selecting the OTU threshold

Operational taxonomic units (OTUs) are a way of grouping 16S rRNA gene sequences together based on their sequence similarity. These OTUs are then compared to a reference database to infer likely taxonomy. It is therefore important to select an appropriate similarity threshold to identify OTUs that can properly distinguish between genuine variation in the 16S sequence or artificial variation introduced through sequencing error. There is some debate over what the best thresholds are, however I have decided to use the default threshold of 97%(as recommended by QIIME at the time of writing). Furthermore, experimentation with different (i.e. more stringent) threshold settings did not yield significant differences in total number of sequences.

Quality Checking (QC)

Chimeras