--- title: "Introduction to the archaeacentre package" author: - name: Richard Stöckl affiliation: - Archaea Centre Regensburg email: richard.stoeckl@ur.de output: BiocStyle::html_document: self_contained: yes toc: true toc_float: true toc_depth: 2 code_folding: show date: "`r doc_date()`" package: "`r pkg_ver('archaeacentre')`" vignette: > %\VignetteIndexEntry{Introduction to archaeacentre} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html ) ``` # Basics ## Install `archaeacentre` `R` is an open-source statistical environment which can be easily modified to enhance its functionality via packages. `r Biocpkg("biocthis")` is a `R` package available via the [Github](https://github.com/richardstoeckl/archaeacentre) repository. Get the latest stable `R` release from [CRAN](http://cran.r-project.org/). Then install the development of `archaeacentre` from [GitHub](https://github.com/richardstoeckl/archaeacentre) with: ```{r 'install_dev', eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install(c("remotes", "richardstoeckl/archaeacentre")) ``` # Presets for growth curves of microbial growth data ## Plot a basic growth curve ```{r 'basic growth curve', eval = TRUE, echo = TRUE} # 1. load the package library(archaeacentre) # 2. load some test data included with the package testData <- archaeacentre::growthData # 3. plot the growth curves in their most basic way, with the presets for "robert": archaeacentre::plotGrowthCurve(testData, timepoint, concentration, grouping = c("timepoint", "organism"), organism, type = "robert") ``` ## Modify your growth curve plot Since the `plotGrowthCurve` function is basically just a wrapper around a ggplot2 plot with some default settings, you can modify the plot to your liking. Here is one example: ```{r 'modify growth curve plot', eval = TRUE, echo = TRUE} library(ggplot2) archaeacentre::plotGrowthCurve(testData, timepoint, concentration, grouping = c("timepoint", "organism"), organism, type = "robert") + ggplot2::labs(title = "Growth curve of some test data", x = "Timepoint after inocculation in [h]", y = "Concentration in [cells/mL]") + ggplot2::guides(color = guide_legend(title = "Species")) + ggplot2::facet_grid(~organism) ``` # Get PDB Annotation data for a given PDB filename as returned by Foldseek \href{https://github.com/steineggerlab/foldseek}{Foldseek} is a fast search tool for comparing protein structures. When searching for similar structures in the PDB, Foldseek returns a table which contains the "target" column. In the case of searches against the PDB, this target column contains the PDB filename of the hit, which is not easily interpretable. The \href{https://www.rcsb.org/}{Research Collaboratory for Structural Bioinformatics (RCSB)} provides structural and functional annotations for macromolecules stored in the Protein Data Bank (PDB). The `get_pfam_annotation_for_targets()` function automates the extraction of Pfam domain annotations for these target PDB filenames returned by Foldseek,using the \href{https://data.rcsb.org/index.html#gql-api}{RCSB GraphQL API}. ```{r 'get_pfam_annotation_for_targets', eval = TRUE, echo = TRUE} # Get a vector of PDB filenames. This could be the "target" column of a Foldseek search result table. targets <- c("1U04_assembly1.cif_A", "8HL4_assembly1.cif_L18P") # Get the Pfam annotations for these targets. As the RCSB API has a rate limit, we recommend to use a batch size of 1000 or lower. pfam_results <- get_pfam_annotation_for_targets(targets, batch_size = 1000) head(pfam_results) ```