Get Adobe Flash player

eShop in silico

Banner

in silico biology's

Banner
Home Comparative Genomics

O09 Metagenome Analysis using NGS 16S rRNA Reads

IMC_O09.en.html

IMC 16SrRNA Metagenome Analysis

IMC 16SrRNA Metagenome Analysis

Functions

  1. Counting 16SrRNA reads that are derived from metagenome samples, show shares and frequencies of the reads in six ranks of the taxonomy tree.
  2. Up to 24 metagenome samples can be analyzed.
  3. Results can be saved as files and referred and viewed later.
  4. Charts are printable and saved as image files.
  5. The summing up results are saved as CSV files.
  6. As for the 16SrRNA reference data, some of the existing databases can be registered and selected to be referred.

Output Graphs

  1. Percentage Bar Chart and Histogram by Phylum
    • Bar chart indicating the percentage of reads that belong to each phylum

  • IMC_5.0.13_C105_004.JPG
  • Histogram indicating the frequency of reads that belong to each phylum

  • IMC_5.0.13_C105_003.JPG
  1. Percentage Bar Chart and Histogram by Class
    • Bar chart indicating the percentage of reads that belong to each class

  • IMC_5.0.13_C105_006.JPG
  • Histogram indicating the frequency of reads that belong to each class

  • IMC_5.0.13_C105_005.JPG
  1. Percentage Bar Chart and Histogram by Order
    • Bar chart indicating the percentage of reads that belong to each order

  • IMC_5.0.13_C105_008.JPG
  • Histogram indicating the frequency of reads that belong to each order

  • IMC_5.0.13_C105_007.JPG
  1. Percentage Bar Chart and Histogram by Family
    • Bar chart indicating the percentage of reads that belong to each family
    • IMC_5.0.13_C105_025.JPG
    • Histogram indicating the frequency of reads that belong to each family

  • IMC_5.0.13_C105_009.JPG
  1. Percentage Bar Chart and Histogram by Family
    • Bar chart indicating the percentage of reads that belong to each genus

  • IMC_5.0.13_C105_026.JPG
  • Histogram indicating the frequency of reads that belong to each genus

  • IMC_5.0.13_C105_021.JPG
  1. Percentage Bar Chart and Histogram by Species
    • Bar chart indicating the percentage of reads that belong to each species

  • IMC_5.0.13_C105_027.JPG

  • IMC_5.0.13_C105_028.JPG
  • Histogram indicating the frequency of reads that belong to each species

  • IMC_5.0.13_C105_031.JPG

Restrictions

At the time of the release of the IMC Version 5.0.13, two files that consist of paired-end 16SrRNA reads are necessary as input files. Each pair reads must have same offset in each FastQ format file.

Performance

The estimating process speed is proportional to the size of the selected reference databases and the number of reads.

Algorithms and Data Structures

Algorithms

  1. Paired-end reads are derived by PCR fragments of the V4 region of 16SrRNA.
  2. Using the BL2SEQ program of NCBI blast, each paired reads are aligned and joined as one consensus sequence.
  3. Each consensus sequence is homology searched using BlastN against user-specified 16SrRNA databases.
  4. The top hits are used for counting.
  5. Each hit subject is referred to the taxonomy table with its "Accession Number".
    • The taxonomy table is edited from the taxonomy data of NCBI, from the table, the scientific names can be referred using "Accession Number" as the search key.
  6. Counting hits that belong to same rank, namely same phylum, class, order, family, genus and species, the shares are computed.

Data Structure

  • 16SrRNA reference genome data format
    • DDBJ 16SrRNA FastA data format
    • GreenGene 16SrRNA FastA data format
  • Sample 16SrRNA data format
    • -- Paired-end reads that are overlapped in the 3'ends are saved in two FastQ format files that has same offset, is necessary.
    • -- As for multiple sample data, the files must be saved in the same directory structure that is described as below.

Multiple Sample|Paired-end data must be stored as below.

  • Make one root directory.
    • Under the root directory, same number of sub directories as the number of the sample are created.
      • Under each sub directory, each paired reads FastQ files are assigned.
  • Sample Root Directory
    • Sample 1 Sub Directory
      • FastQ1_1
      • FastQ1_2
    • Sample 2 Sub Directory
      • FastQ2_1
      • FastQ2_2
    • Sample 3 Sub Directory
      • FastQ3_1
      • FastQ3_2

Operations

File Formats

  • FastQ Format File
    • Each pair of reads that are sequenced from a same PCR fragment must have the same offset value of each belonging FastQ format file.
    • Each pair of reads must have complementally overlapped 5' ends.

Result Output

  • The percentages of 16SrRNA reads that belong to a same scientific name is shown.
  • Results can be saved as CSV format files.

Downloading 16SrRNA Reference Sequences

  • DDBJ 16SrRNA data
    • ftp://ftp.ddbj.nig.ac.jp/ddbj_database/16S/16S.seq.gz
  • GreenGene 16SrRNA data
    • http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/current_GREENGENES_gg16S_unaligned.fasta.gz

Registration of 16SrRNA Reference Databases

  • Prior to run an analysis, at least one 16SrRNA reference database is registered.
  • To create and register a 16SrRNA reference database, use from the menu "File" --> "DB Create...".
  1. Launch IMC.
  2. From the Menu Bar, click "File" --> "DB Create...".
    • The "Blast DB List" dialog is displayed.

IMC_5.0.13_C105_049.JPG

  • IMC_5.0.13_C105_049.JPG
  1. Click "Add Nucleotide Sequence DB...".
    • The "Blast DB Setting" dialog is displayed.

IMC_5.0.13_C105_050.JPG

  • IMC_5.0.13_C105_050.JPG
  1. Specify 16SrRNA reference sequence data files in the field of "Nucleotide Sequence File(s)". Click "Ref..." to use the file chooser.
  2. Enter an unique name in the field "DB Name" to be used as its database name.

IMC_5.0.13_C105_055.JPG

  • IMC_5.0.13_C105_055.JPG
  1. Check "16SrRNA" check box.

IMC_5.0.13_C105_056.JPG

  • IMC_5.0.13_C105_056.JPG
  1. To register the databases in the local computer, press "Save DB in Local Directory" and click "Ref..." to specify the save directory.

IMC_5.0.13_C105_057.JPG

  • ~ &ref(imcimgO/IMC_5.0.13_C105_057.JPG);
  1. To register the database in an external remote server, press "Save DB on External Server" and enter the following settings for the server.

Host Name|

User ID|

Password|

  1. Click "Set".
    • A confirm message window is displayed.

IMC_5.0.13_C105_060.JPG

  • IMC_5.0.13_C105_060.JPG
  • The DB creation is started.
  • During the registration, a progress message is shown.

IMC_5.0.13_C105_061.JPG

  • IMC_5.0.13_C105_061.JPG
  • When the DB registration has finished, a complete message is shown.

IMC_5.0.13_C105_062.JPG

  • IMC_5.0.13_C105_062.JPG
  1. Click "OK" to close it.

Preparations of 16SrRNA Megagenome Read Files

  • Multiple Samples|All the files must be stored under the directory structure as described as Data Structure.
  • Single Sample|Specify the two files one by one.

Partial Read Range Setting

Only a part of reads can be analyzed by setting "From" and "To" fields.

Execution of an analysis of 16S rRNA metagenome

  1. Launch IMC.
  2. Select from the Menu Bar, "Genome Analysis" --> "16S rRNA Metagenome Analysis".
    • "16S rRNA Metagenome Analysis" dialog is displayed.

IMC_5.0.13_C105_080.JPG

  • IMC_5.0.13_C105_080.JPG
  1. Specify 16SrRNA FastQ format files.
    • In the case that only one sample is analyzed.
  1. Specify one of the paired-end FastQ format files for "Forward File".
  2. Specify the other of the paired-end FastQ files for "Reverse File".

IMC_5.0.13_C105_066.JPG

  • IMC_5.0.13_C105_066.JPG
  • In the case that multiple samples are analyzed in a single operation.
  1. Specify a directory that has the data structure.

IMC_5.0.13_C105_068.JPG

  • IMC_5.0.13_C105_068.JPG
  1. In the case that a range of the reads is specified, untick "Analyze whole data" and enter a start read position (From:) and an end read position (To:) to be analyzed.
  2. Check one or more 16S rRNA reference databases.

IMC_5.0.13_C105_071.JPG

  • IMC_5.0.13_C105_071.JPG
  • Multiple selection of databases are possible.
  1. To change the homology search parameters, click "Parameter".

IMC_5.0.13_C105_073.JPG

  • IMC_5.0.13_C105_073.JPG

IMC_5.0.13_C105_074.JPG

  • IMC_5.0.13_C105_074.JPG
  1. Click "Set".
    • The analysis is started.
    • During the analysis, a progress message is shown.

IMC_5.0.13_C105_011.JPG

  • IMC_5.0.13_C105_011.JPG

Viewing and Saving of the 16SrRNA Metagenome Analysis

  1. Launch IMC.
  2. From the "Menu Bar" select "Genome Analysis" --> "Sum up of 16SrRNA".
    • "Sum up Result" dialog is shown.

IMC_5.0.13_C105_014.JPG

  • IMC_5.0.13_C105_014.JPG
  1. Specify a result saving file name.
    • As default, the most recently analyzed result is shown in the field.
  2. Click "Set".
    • Summing up process starts.

IMC_5.0.13_C105_013.JPG

  • IMC_5.0.13_C105_013.JPG
  • At completion of the summing up, "16SrRNA Metagenome Analysis" result window is displayed.

IMC_5.0.13_C105_015.JPG

  • IMC_5.0.13_C105_015.JPG

Show Bar Charts Indicating the Frequency and Percentage of Reads

  1. From the Menu Bar, select "Genome Analysis" --> "Sum up of 16SrRNA Result".
    • A file chooser is displayed.
  2. Select one of the results files of the 16S rRNA metagenome analysis.
    • As default, the latest result file is specified.
  3. Click "Set".
    • Summing up of the result starts, and during the summing, a progress message is shown.
    • Upon completion of the summing up, the result window with a histogram of reads by phylum.
  4. Click "Graph".
    • A new window with a bar chart indicating percentage of reads by phylum is shown.

IMC_5.0.13_C105_004.JPG

  • IMC_5.0.13_C105_004.JPG
  1. Click "Close".
    • The graph window is closed.
  2. Click "Class" tab.
    • The bar chart indicating the percentage of reads by class rank is shown.

Tips

If a smaller sized 16SrRNA reference database is used, the process speed becomes faster.

Bug Report

Unfixed Bugs

When the number of reads to be analyzed exceeds 10,000, a running result file may not be correctly generated.

  • Interim coping: Set megablast parameters as below.
    • -b=1
    • -v=1

Fixed Bugs

  • IMC version 5.0.13
    • Fixed bug: Read counters are not correct.

Recent Updates and Improvements

Future Enhancements

Acceptance of general format 16SrRNA sequence data files.

References

  • MiSeq16S>http://www.illuminakk.co.jp/pdf/AN_MiSeq16S.pdf
  • Caporaso JG et al.>http://www.pnas.org/content/108/suppl.1/4516.full

Related Functions

Language Selection

Japanese(JP)