Skip to content

Commit 268e5da

Browse files
Added vignette and prepared for release
1 parent 52aa083 commit 268e5da

File tree

6 files changed

+281
-27
lines changed

6 files changed

+281
-27
lines changed

.Rbuildignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
^\.travis\.yml$
22
^README\.html$
3+
^README\.Rmd$
34
^.+\.Rproj$
45
^\.Rproj\.user$
56
^scratch$

R/heat_tree_matrix.R

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@
1717
#' @examples
1818
#' \dontrun{
1919
#' # Parse dataset for plotting
20-
#' x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
21-
#' class_key = c(tax_rank = "info", tax_name = "taxon_name"),
22-
#' class_regex = "^(.+)__(.+)$")
20+
#' x <- parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
21+
#' class_key = c(tax_rank = "info", tax_name = "taxon_name"),
22+
#' class_regex = "^(.+)__(.+)$")
2323
#'
2424
#' # Convert counts to proportions
2525
#' x$data$otu_table <- calc_obs_props(x, dataset = "tax_data", cols = hmp_samples$sample_id)

README.Rmd

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
[![Build Status](https://travis-ci.org/grunwaldlab/metacoder.png?branch=master)](https://travis-ci.org/grunwaldlab/metacoder?branch=master) [![codecov.io](https://codecov.io/github/grunwaldlab/metacoder/coverage.svg?branch=master)](https://codecov.io/github/grunwaldlab/metacoder?branch=master)
2+
[![Downloads from Rstudio mirror per month](http://cranlogs.r-pkg.org/badges/metacoder)](http://www.r-pkg.org/pkg/metacoder)
3+
[![Downloads from Rstudio mirror](http://cranlogs.r-pkg.org/badges/grand-total/metacoder)](http://www.r-pkg.org/pkg/metacoder)
4+
[![CRAN version](http://www.r-pkg.org/badges/version/metacoder)](https://cran.r-project.org/package=metacoder)
5+
6+
![](readme_figure.png)
7+
8+
An R package for metabarcoding research planning and analysis
9+
-------------------------------------------------------------
10+
11+
Metabarcoding is revolutionizing microbial ecology and presenting new challenges:
12+
13+
- Numerous database formats make taxonomic data difficult to parse, combine, and subset.
14+
- Stacked bar charts, commonly used to depict community diversity, lack taxonomic context and are limited by the number of discernible colors.
15+
- Barcode loci and primers are a source of under-explored bias.
16+
17+
Metacoder is an R package that attempts to addresses these issues:
18+
19+
- Sources of taxonomic data can be extracted from most file formats and manipulated.
20+
- Community diversity can be visualized by color and size in a tree plot.
21+
- Primer specificity can be estimated with *in silico* PCR.
22+
23+
## Installation
24+
25+
Stable releases are available on CRAN and can be installed in the standard way:
26+
27+
install.packages("metacoder")
28+
29+
The most recent version can be installed from Github:
30+
31+
devtools::install_github("ropensci/taxa")
32+
devtools::install_github("grunwaldlab/metacoder")
33+
library(metacoder)
34+
35+
36+
```{r child = 'vignettes/introduction.Rmd'}
37+
```

cran-comments.md

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
## Test environments
22

33
* local desktop: ubuntu 16.04 install, R 3.4.0
4-
* on travis-ci: ubuntu 12.04.5 + R 3.4.0, devel [2017-05-22 r72718]
4+
* on travis-ci: ubuntu 14.04.5 + R 3.4.2, devel, bioc-release [2018-01-04]
55
* win-builder
66

77
## R CMD check results
@@ -16,18 +16,7 @@ There were no ERRORs, WARNINGs or NOTEs.
1616

1717
### On winbuilder:
1818

19-
```
20-
* checking whether package 'metacoder' can be installed ... WARNING
21-
Found the following significant warnings:
22-
Warning: Installed Rcpp (0.12.11) different from Rcpp used to build dplyr (0.12.10).
23-
Warning: Installed R (R Under development (unstable) (2017-05-20 r72713)) different from R used to build dplyr (R Under development (unstable) (2017-05-20 r72708)).
24-
See 'd:/RCompile/CRANguest/R-devel/metacoder.Rcheck/00install.out' for details.
25-
```
26-
27-
I think this is because dplyr 0.6 will be released to CRAN soon, but not yet.
28-
This version of metacoder is compatible with both the old and new versions of dplyr.
29-
One of the major reasons for this update is to be compatible with dplyr 0.6.
30-
19+
There were no ERRORs, WARNINGs or NOTEs.
3120

3221
## Downstream dependencies
3322

man/heat_tree_matrix.Rd

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vignettes/introduction.Rmd

Lines changed: 235 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,29 +4,256 @@ author: "Zachary S. L. Foster and Niklaus J. Grünwald"
44
date: "`r Sys.Date()`"
55
output: rmarkdown::html_vignette
66
vignette: >
7-
%\VignetteIndexEntry{An introduction to MetacodeR}
7+
%\VignetteIndexEntry{An introduction to metacoder}
88
%\VignetteEngine{knitr::rmarkdown}
99
%\VignetteEncoding{UTF-8}
1010
---
11-
11+
1212
```{r home_setup, echo=FALSE, warning=FALSE, message=FALSE}
13-
options(width = 80)
13+
options(width = 90)
1414
set.seed(1)
1515
# Knitr
1616
library(knitr)
1717
library(grid)
18-
opts_chunk$set(dev = 'png', fig.width = 7, fig.height = 7, warning = FALSE, message = FALSE)
18+
opts_chunk$set(dev = 'png', fig.width = 7, fig.height = 7, warning = TRUE, message = TRUE)
1919
```
2020

21-
2221
## Documentation
2322

24-
This is only a short introduction.
23+
This is only a short demonstration.
2524
See the full documentation at http://grunwaldlab.github.io/metacoder_documentation.
2625

27-
UNDER CONSTRUCTION
26+
## Parsing
27+
28+
Many functions that used to be in `metacoder` have now been moved into the `taxa` package.
29+
These include the flexible parsers and dplyr-like data-manipulation functions.
30+
If you have an non-standard data format or want to use the more flexible `taxa` parsers, check out the intro to the taxa package [here](https://github.com/ropensci/taxa).
31+
`Metacoder` now has functions for parsing specific file formats used in metagenomics research.
32+
However, for this demonstration, we will be using a parser from the `taxa` package meant for tabular data.
33+
34+
35+
Included in `metacoder` is an example dataset that is a subset of the Human Microbiome Project data.
36+
This dataset has two parts:
37+
38+
* An abundance matrix called `hmp_otus`, with samples in columns and OTUs in rows
39+
* A sample information table called `hmp_samples`, with samples as rows and columns of information describing the samples (e.g. gender).
40+
41+
This is the preferred way to encode this type of abundance information in `metacoder` and `taxa`.
42+
Lets take a look at this data:
43+
44+
```{r}
45+
library(metacoder)
46+
print(hmp_otus)
47+
print(hmp_samples)
48+
```
49+
50+
We can parse the taxonomic information in the abundance matrix using a parser from `taxa`:
51+
52+
```{r}
53+
obj <- parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
54+
class_key = c(tax_rank = "info", tax_name = "taxon_name"),
55+
class_regex = "^(.+)__(.+)$")
56+
57+
```
58+
59+
This returns a `taxmap` object.
60+
The `taxmap` class is designed to store any number of tables, lists, or vectors associated with taxonomic information and facilitate manipulating the data in a cohesive way.
61+
Here is what that object looks like:
62+
63+
```{r}
64+
print(obj)
65+
```
66+
67+
## Abundance matrix manipulations
68+
69+
### Removing low-abundance counts
70+
71+
Low-abundance sequences might be the result of sequencing error, so typically we remove any counts/OTUs with less than some number of reads.
72+
Lets set all counts with less than 5 reads to zero:
73+
74+
```{r}
75+
obj$data$tax_data <- zero_low_counts(obj, "tax_data", min_count = 5)
76+
```
77+
78+
There might now be some OTUs with no "real" reads. Lets check:
79+
80+
```{r}
81+
no_reads <- rowSums(obj$data$tax_data[, hmp_samples$sample_id]) == 0
82+
sum(no_reads)
83+
```
84+
85+
It appears that `r sum(no_reads)` of `r nrow(obj$data$tax_data)` OTUs now have no reads.
86+
We can remove those OTUs and their associated taxa with `filter_obs`:
87+
88+
```{r}
89+
obj <- filter_obs(obj, "tax_data", ! no_reads, drop_taxa = TRUE)
90+
print(obj)
91+
```
92+
93+
Note how there are fewer taxa now, as well as fewer OTUs.
94+
This coordinated manipulation of taxonomic and abundance data is one of the main benefits of using the `taxmap` class.
95+
96+
97+
### Accounting for un-even sampling
98+
99+
These are raw counts, but people typically work with rarefied counts or proportions to avoid sampling depth biasing the results.
100+
The function `rarefy_obs` will return the rarefied counts for a table in a taxmap object, but lets use proportions for this demonstration:
101+
102+
```{r}
103+
obj$data$tax_data <- calc_obs_props(obj, "tax_data")
104+
print(obj)
105+
```
106+
107+
108+
### Getting per-taxon information
109+
110+
Currently, we have values for the abundance of each OTU, not each taxon.
111+
To get information on the taxa, we can sum the abundance per-taxon like so:
112+
113+
```{r}
114+
obj$data$tax_abund <- calc_taxon_abund(obj, "tax_data",
115+
cols = hmp_samples$sample_id)
116+
print(obj)
117+
```
118+
119+
Note that there is now an additional table with one row per taxon.
120+
121+
We can also easily calculate the number of samples have reads for each taxon:
122+
123+
```{r}
124+
obj$data$tax_occ <- calc_n_samples(obj, "tax_abund", groups = hmp_samples$body_site)
125+
print(obj)
126+
```
127+
128+
129+
### Plotting taxonomic data
130+
131+
Now that we have per-taxon information, we can plot the information using heat trees.
132+
The code below plots the number of "Nose" samples that have reads for each taxon.
133+
It also plots the number of OTUs assigned to each taxon in the overall dataset.
134+
135+
```{r}
136+
heat_tree(obj,
137+
node_label = taxon_names,
138+
node_size = n_obs,
139+
node_color = Nose,
140+
node_size_axis_label = "OTU count",
141+
node_color_axis_label = "Samples with reads")
142+
```
143+
144+
Note how we did not have to specify the full path to the variable "Nose", but just its name.
145+
This is a shorthand for convenience.
146+
We could have made the same plot using this command:
147+
148+
```{r, eval = FALSE}
149+
heat_tree(obj,
150+
node_label = obj$taxon_names(),
151+
node_size = obj$n_obs(),
152+
node_color = obj$data$tax_occ$Nose,
153+
node_size_axis_label = "OTU count",
154+
node_color_axis_label = "Samples with reads")
155+
```
156+
157+
158+
### Comparing two treatments/groups
159+
160+
Usually we are interested in how groups of samples compare.
161+
For example, we might want to know which taxa differ between the nose and throat, or between men and women.
162+
The function `compare_groups` facilitates these comparisons:
163+
164+
```{r, warning = FALSE}
165+
obj$data$diff_table <- compare_groups(obj, dataset = "tax_abund",
166+
cols = hmp_samples$sample_id,
167+
groups = hmp_samples$sex)
168+
print(obj$data$diff_table)
169+
```
170+
171+
We can use this information to create what we call a "differential heat tree", which indicates which taxa are more abundant in each treatment:
172+
173+
```{r}
174+
heat_tree(obj,
175+
node_label = taxon_names,
176+
node_size = n_obs,
177+
node_color = log2_median_ratio,
178+
node_color_interval = c(-2, 2),
179+
node_color_range = c("cyan", "gray", "tan"),
180+
node_size_axis_label = "OTU count",
181+
node_color_axis_label = "Log 2 ratio of median proportions")
182+
```
183+
184+
In this case, taxa colored tan are more abundant in women and those colored blue are more abundant in men.
185+
Note that we have not taken into account statistics significance when showing this, so lets do that.
186+
First, we need to correct for multiple comparisons:
187+
188+
```{r}
189+
obj$data$diff_table$wilcox_p_value <- p.adjust(obj$data$diff_table$wilcox_p_value,
190+
method = "fdr")
191+
```
192+
193+
If we then look at the distribution of p-values, we can see that none are even close to significant:
194+
195+
```{r}
196+
hist(obj$data$diff_table$wilcox_p_value)
197+
```
198+
199+
There is no need to graph this, but if there still were some significant differences, we could set any difference that is not significant to zero and repeat the last `heat_tree` command.
200+
201+
### Comparing any number of treatments/groups
202+
203+
A single differential heat tree can compare two treatments, but what if you have more?
204+
Then we can make a matrix of heat trees, one for each pairwise comparison of treatments like so:
205+
206+
```{r, warning = FALSE}
207+
obj$data$diff_table <- compare_groups(obj, dataset = "tax_abund",
208+
cols = hmp_samples$sample_id,
209+
groups = hmp_samples$body_site)
210+
print(obj$data$diff_table)
211+
```
28212

29-
## More inforamtion
213+
There is a special function to plot this type of data called `heat_tree_matrix`:
214+
215+
```{r}
216+
heat_tree_matrix(obj,
217+
dataset = "diff_table",
218+
node_size = n_obs,
219+
node_label = taxon_names,
220+
node_color = log2_median_ratio,
221+
node_color_range = diverging_palette(),
222+
node_color_trans = "linear",
223+
node_color_interval = c(-3, 3),
224+
edge_color_interval = c(-3, 3),
225+
node_size_axis_label = "Number of OTUs",
226+
node_color_axis_label = "Log2 ratio median proportions")
227+
```
228+
229+
230+
## More information
30231

31232
This document is only a short introduction to metacoder and there is much that is not covered.
32233
For more information, see our website at http://grunwaldlab.github.io/metacoder_documentation/ and our github repository at https://github.com/grunwaldlab/metacoder.
234+
There is also extensive help and examples in the function documentation that can be accessed by, for example, `?heat_tree`.
235+
236+
## Feedback
237+
238+
We welcome any kind of feedback!
239+
Let us know if you run into problems by submitting an issue on our Github repo: https://github.com/grunwaldlab/metacoder
240+
241+
## Dependencies
242+
243+
The function that runs *in silico* PCR requires `primersearch` from the EMBOSS tool kit to be installed. This is not an R package, so it is not automatically installed. Type `?primersearch` after installing and loading metacoder for installation instructions.
244+
245+
## Citation
246+
247+
If you use metcoder in a publication, please cite our [article in PLOS Computational Biology](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005404):
248+
249+
Foster ZSL, Sharpton TJ, Grünwald NJ (2017) Metacoder: An R package for visualization and manipulation of community taxonomic diversity data. PLOS Computational Biology 13(2): e1005404. https://doi.org/10.1371/journal.pcbi.1005404
250+
251+
## License
252+
253+
This work is subject to the [MIT License](https://github.com/grunwaldlab/metacoder/blob/master/LICENSE).
254+
255+
## Acknowledgements
256+
257+
This package includes code from the R package [ggrepel](https://github.com/slowkow/ggrepel) to handle label overlap avoidance with permission from the author of [ggrepel](https://github.com/slowkow/ggrepel) [Kamil Slowikowski](https://github.com/slowkow).
258+
We included the code instead of depending on `ggrepel` because we are using functions internal to `ggrepel` that might change in the future.
259+
We thank Kamil Slowikowski for letting us use his code and would like to acknowledge his implementation of the label overlap avoidance used in metacoder.

0 commit comments

Comments
 (0)