| Title: | Multi-Modal Immune Repertoire Analytics for Immunotherapy and Vaccine Design in R |
|---|---|
| Description: | A comprehensive analytics framework for building reproducible pipelines on T-cell and B-cell immune receptor repertoire data. Delivers multi-modal immune profiling (bulk, single-cell, CITE-seq/AbSeq, spatial, immunogenicity data), feature engineering (ML-ready feature tables and matrices), and biomarker discovery workflows (cohort comparisons, longitudinal tracking, repertoire similarity, enrichment). Provides a user-friendly interface to widely used AIRR methods — clonality/diversity, V(D)J usage, similarity, annotation, tracking, and many more. Think Scanpy or Seurat, but for AIRR data, a.k.a. Adaptive Immune Receptor Repertoire, VDJ-seq, RepSeq, or VDJ sequencing data. A successor to our previously published "tcR" R package (Nazarov 2015). |
| Authors: | Vadim I. Nazarov [aut, cre] (ORCID: <https://orcid.org/0000-0003-3659-2709>), Vasily O. Tsvetkov [aut], Aleksandr A. Popov [aut], Ivan Balashov [aut] |
| Maintainer: | Vadim I. Nazarov <[email protected]> |
| License: | Apache License (>= 2.0) |
| Version: | 0.10.3.9002 |
| Built: | 2026-05-21 08:18:44 UTC |
| Source: | https://github.com/immunomind/immunarch |
Get a column's name using the input alias
.quant_column_choice(x).quant_column_choice(x)
x |
Character vector of length 1. |
A string with the column name.
immunarch:::.quant_column_choice("count") immunarch:::.quant_column_choice("freq")
Amino acid / codon table
AA_TABLEAA_TABLE
An object of class table of length 65.
Add a new class attribute
add_class(.obj, .class)add_class(.obj, .class)
.obj |
R object. |
.class |
String with the desired class name. |
Input object with additional class .class.
tmp <- "abc" class(tmp) tmp <- immunarch:::add_class(tmp, "new_class") class(tmp)
A family of functions to quantify receptor overabundance per repertoire. Helps in deciphering the structure and partition the repertoire.
Supported methods are the following.
airr_clonality_line - build ranked abundance lines: for each
repertoire, take the top limit receptors by count and attach repertoire
metadata. Useful for per-repertoire rank-abundance plots.
airr_clonality_rank - aggregate clonal space by rank bins.
Receptors are ordered by proportion within each repertoire; each receptor
is assigned to the smallest threshold in bins that contains its rank.
airr_clonality_prop - aggregate clonal space by proportion bins.
Each receptor is assigned to a named bin according to its proportion
(e.g., Hyperexpanded >= 1e-2, Large >= 1e-3, ...). Thresholds are matched in
descending order; unmatched receptors fall into "Ultra-rare".
airr_clonality_line( idata, limit = 1e+05, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_clonality_rank( idata, bins = c(10, 30, 100, 300, 1000, 10000, 1e+05), autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_clonality_prop( idata, bins = c(Hyperexpanded = 0.01, Large = 0.001, Medium = 1e-04, Small = 1e-05, Rare = 1e-06), autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") )airr_clonality_line( idata, limit = 1e+05, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_clonality_rank( idata, bins = c(10, 30, 100, 300, 1000, 10000, 1e+05), autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_clonality_prop( idata, bins = c(Hyperexpanded = 0.01, Large = 0.001, Medium = 1e-04, Small = 1e-05, Rare = 1e-06), autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") )
idata |
An |
limit |
Positive integer >= 10: maximum number of top receptors to keep
per repertoire (default |
autojoin |
Logical. If TRUE, join repertoire metadata by the schema repertoire id.
Change the default behaviour by calling |
format |
String. One of |
bins |
A named numeric vector of thresholds (e.g.,
|
airr_clonality_lineA tibble with columns:
repertoire_id - repertoire identifier
index - rank within repertoire (1 = most abundant)
count - receptor count used for ranking
plus any repertoire metadata columns carried from idata$repertoires
airr_clonality_rankA tibble with
repertoire_id
clonal_rank_bin - the rank threshold (e.g., 10, 100, ...)
occupied_prop - sum of proportion within the bin
plus repertoire metadata columns from idata$repertoires
airr_clonality_propA tibble with
repertoire_id
clonal_prop_bin - factor-like label from names(bins) or "Ultra-rare"
occupied_prop - sum of proportion within the bin
plus repertoire metadata columns from idata$repertoires
Per-repertoire summaries: annotate_clonality
Data container: immundata::ImmunData
# Limit the number of threads used by the underlying DB for this session. # Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers). db_exec("SET threads TO 1") # Load data ## Not run: immdata <- get_test_idata() |> agg_repertoires("Therapy") ## End(Not run) # # airr_clonality_line # ## Not run: top_line <- airr_clonality_line(immdata, limit = 1000) ## End(Not run) # # airr_clonality_rank # ## Not run: rank_stat <- airr_clonality_rank(immdata, bins = c(10, 100)) ## End(Not run) # # airr_clonality_prop # ## Not run: prop_stat <- airr_clonality_prop(immdata) ## End(Not run)# Limit the number of threads used by the underlying DB for this session. # Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers). db_exec("SET threads TO 1") # Load data ## Not run: immdata <- get_test_idata() |> agg_repertoires("Therapy") ## End(Not run) # # airr_clonality_line # ## Not run: top_line <- airr_clonality_line(immdata, limit = 1000) ## End(Not run) # # airr_clonality_rank # ## Not run: rank_stat <- airr_clonality_rank(immdata, bins = c(10, 100)) ## End(Not run) # # airr_clonality_prop # ## Not run: prop_stat <- airr_clonality_prop(immdata) ## End(Not run)
A family of functions that extract core descriptive statistics from an ImmunData object.
Supported methods are the following.
airr_desc_chains — count V(D)J chains per repertoire
(optionally split by locus). Quickly gauges capture depth per repertoire
and, when split by locus, reveals TRA/TRB/IGH balance. Use it for QC,
library-size checks, and to spot locus-specific dropouts or
over-representation.
airr_desc_lengths — count the number of sequence lengths
per repertoire. Summarizes the CDR3 length distribution, a sensitive QC
fingerprint of repertoire prep and selection. Helpful for detecting
primer/UMI biases, comparing cohorts, and deriving length-based features for
models.
airr_desc_genes - count V(D)J gene segments per repertoire,
optionally split by locus and using either receptor counts or barcode/UMI
counts as the measure. Profiles V/D/J gene usage to characterize repertoire
composition and germline biases, with optional locus split. Useful for
cohort comparisons, flagging clonal expansions, and producing ML-ready
features for repertoire-level ML tasks.
airr_desc_chains( idata, locus_col = NA, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_desc_lengths( idata, seq_col = "cdr3_aa", autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_desc_genes( idata, gene_col = "v_call", level = c("receptor", "barcode"), by = c(NA, "locus"), autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") )airr_desc_chains( idata, locus_col = NA, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_desc_lengths( idata, seq_col = "cdr3_aa", autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_desc_genes( idata, gene_col = "v_call", level = c("receptor", "barcode"), by = c(NA, "locus"), autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") )
idata |
An |
locus_col |
Column in |
autojoin |
Logical. If TRUE, join repertoire metadata by the schema repertoire id.
Change the default behaviour by calling |
format |
String. One of |
seq_col |
Character vector with names of the columns containing sequences. |
gene_col |
A single column name in |
level |
One of |
by |
Either |
airr_desc_chains Returns a tibble with columns:repertoire_id – repertoire identifier
locus – TRA, TRB, IGH, ... (present only if locus_col is supplied)
n_chains – number of chains
airr_desc_lengths Returns a tibble with columns:repertoire_id – repertoire identifier
seq_len – lengths of sequences
n – number of receptors
airr_desc_genes A tibble with columns:repertoire_id - repertoire identifier
(optional) locus - TRA, TRB, IGH, ... (present only when by = "locus"
and the locus column exists)
<gene_col> - the gene segment value (e.g., V gene)
n - the measure:
if level = "receptor": number of receptors carrying the gene segment
if level = "barcode": sum of counts across receptors for the segment
# Limit the number of threads used by the underlying DB for this session. # Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers). db_exec("SET threads TO 2") # Load data ## Not run: immdata <- get_test_idata() |> agg_repertoires("Therapy") ## End(Not run) # # airr_desc_chains # ## Not run: airr_desc_chains(immdata) ## End(Not run) # # airr_desc_lengths # ## Not run: airr_desc_lengths(immdata) ## End(Not run) # # airr_desc_genes # ## Not run: # V gene usage by receptor count airr_desc_genes(immdata, gene_col = "v_call", level = "receptor") # V gene usage by summed cell/UMI counts (if a count column is present) airr_desc_genes(immdata, gene_col = "v_call", level = "barcode") # Split by locus (TRA/TRB/... if locus column exists) airr_desc_genes(immdata, gene_col = "v_call", level = "receptor", by = "locus") ## End(Not run)# Limit the number of threads used by the underlying DB for this session. # Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers). db_exec("SET threads TO 2") # Load data ## Not run: immdata <- get_test_idata() |> agg_repertoires("Therapy") ## End(Not run) # # airr_desc_chains # ## Not run: airr_desc_chains(immdata) ## End(Not run) # # airr_desc_lengths # ## Not run: airr_desc_lengths(immdata) ## End(Not run) # # airr_desc_genes # ## Not run: # V gene usage by receptor count airr_desc_genes(immdata, gene_col = "v_call", level = "receptor") # V gene usage by summed cell/UMI counts (if a count column is present) airr_desc_genes(immdata, gene_col = "v_call", level = "barcode") # Split by locus (TRA/TRB/... if locus column exists) airr_desc_genes(immdata, gene_col = "v_call", level = "receptor", by = "locus") ## End(Not run)
A family of functions to quantify receptor diversity per repertoire. A characteristic of a whole repertoire.
Supported methods are the following.
airr_diversity_dxx - coverage diversity: minimal number of
top receptors needed to reach perc% of clonal space (by proportion).
Great for spotting dominance/overexpansion and for quick, interpretable dashboards
(e.g., D50 = receptors to cover half of the repertoire).
airr_diversity_chao1 - Chao1 estimator is a nonparameteric
asymptotic estimator of species richness (number of species in a population).
One of the most used methods for estimating immune repertoire diversity.
airr_diversity_shannon - Shannon entropy (base 2) per repertoire
computed from proportion. Ideal when you want a single evenness-aware
diversity score; pair with Pielou/Hill for samples with very different richness.
airr_diversity_pielou - Pielou's evenness H / log2(S) with
richness S. Best when you need a size-normalized evenness score that's
comparable across repertoires with different receptor counts.
airr_diversity_index - convenience alias for Hill number with
q = 1 (exp(Shannon) using natural log). A solid default single metric
that's relatively robust to rare-count noise and easy to compare across samples.
airr_diversity_hill - Hill numbers ("true diversity") for
orders q \eqn{\in}{in} {0, 1, 2, ...}: q=0 richness, q=1 exp(Shannon), q>1
emphasizes abundant receptors. Perfect when you want a diversity profile
that tunes sensitivity to rare vs. abundant clonotypes.
airr_diversity_dxx( idata, perc = 50, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_diversity_chao1( idata, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_diversity_shannon( idata, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_diversity_pielou( idata, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_diversity_index( idata, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_diversity_hill( idata, q = 0:5, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") )airr_diversity_dxx( idata, perc = 50, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_diversity_chao1( idata, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_diversity_shannon( idata, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_diversity_pielou( idata, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_diversity_index( idata, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) airr_diversity_hill( idata, q = 0:5, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") )
idata |
An |
perc |
A number or numeric vector in |
autojoin |
Logical. If TRUE, join repertoire metadata by the schema repertoire id.
Change the default behaviour by calling |
format |
String. One of |
q |
A scalar or vector of non-negative orders. Defaults to |
airr_diversity_dxxA tibble with:
imd_repertoire_id
perc
dxx - minimal count of top receptors to reach perc%
plus repertoire metadata from idata$repertoires
airr_diversity_chao1A tibble with:
imd_repertoire_id
Estimator - number of species
SD - standard deviation for the estimator value
Conf.95.lo - CI 0.025
Conf.95.hi - CI 0.975
plus repertoire metadata from idata$repertoires
airr_diversity_shannonA tibble with:
imd_repertoire_id
shannon - entropy in bits
airr_diversity_pielouA tibble with:
imd_repertoire_id
shannon
n_receptors
pielou - evenness in [0, 1] (NA if S <= 1)
airr_diversity_indexA tibble with:
imd_repertoire_id
q = 1
hill_number
plus repertoire metadata from idata$repertoires
airr_diversity_hillA tibble with:
imd_repertoire_id
q - Hill order
hill_number - true diversity of order q
plus repertoire metadata from idata$repertoires
# Limit the number of threads used by the underlying DB for this session. # Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers). db_exec("SET threads TO 1") # Load data ## Not run: immdata <- get_test_idata() |> agg_repertoires("Therapy") ## End(Not run) # # airr_diversity_dxx # ## Not run: d50 <- airr_diversity_dxx(immdata, perc = 50) d_multi <- airr_diversity_dxx(immdata, perc = c(20, 50, 80)) ## End(Not run) # # airr_diversity_chao1 # ## Not run: chao <- airr_diversity_chao1(immdata) ## End(Not run) # # airr_diversity_shannon # ## Not run: sh <- airr_diversity_shannon(immdata) ## End(Not run) # # airr_diversity_pielou # ## Not run: pj <- airr_diversity_pielou(immdata) ## End(Not run) # # airr_diversity_index # ## Not run: idx <- airr_diversity_index(immdata) ## End(Not run) # # airr_diversity_hill # ## Not run: hill <- airr_diversity_hill(immdata, q = c(0, 1, 2)) ## End(Not run)# Limit the number of threads used by the underlying DB for this session. # Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers). db_exec("SET threads TO 1") # Load data ## Not run: immdata <- get_test_idata() |> agg_repertoires("Therapy") ## End(Not run) # # airr_diversity_dxx # ## Not run: d50 <- airr_diversity_dxx(immdata, perc = 50) d_multi <- airr_diversity_dxx(immdata, perc = c(20, 50, 80)) ## End(Not run) # # airr_diversity_chao1 # ## Not run: chao <- airr_diversity_chao1(immdata) ## End(Not run) # # airr_diversity_shannon # ## Not run: sh <- airr_diversity_shannon(immdata) ## End(Not run) # # airr_diversity_pielou # ## Not run: pj <- airr_diversity_pielou(immdata) ## End(Not run) # # airr_diversity_index # ## Not run: idx <- airr_diversity_index(immdata) ## End(Not run) # # airr_diversity_hill # ## Not run: hill <- airr_diversity_hill(immdata, q = c(0, 1, 2)) ## End(Not run)
A small family of helpers that add clonality labels to each receptor in an immundata::ImmunData object.
annotate_clonality_rank() - label by rank bins within each repertoire.
annotate_clonality_prop() - label by proportion bins (named thresholds).
annotate_clonality_rank() - for each repertoire, receptors are ordered by
within-repertoire abundance (proportion) and assigned a rank bin label.
annotate_clonality_prop() - label each receptor by proportion bin
using named thresholds (matched in descending order; else "Ultra-rare").
annotate_clonality_rank( idata, bins = c(10, 30, 100, 300, 1000, 10000, 1e+05), autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) annotate_clonality_prop( idata, bins = c(Hyperexpanded = 0.01, Large = 0.001, Medium = 1e-04, Small = 1e-05, Rare = 1e-06), autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") )annotate_clonality_rank( idata, bins = c(10, 30, 100, 300, 1000, 10000, 1e+05), autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) annotate_clonality_prop( idata, bins = c(Hyperexpanded = 0.01, Large = 0.001, Medium = 1e-04, Small = 1e-05, Rare = 1e-06), autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") )
idata |
An immundata::ImmunData object. |
bins |
A named numeric vector of thresholds (e.g.,
|
autojoin |
Logical. If TRUE, join repertoire metadata by the schema repertoire id.
Change the default behaviour by calling |
format |
String. One of |
An immundata::ImmunData whose $annotations gains:
clonal_rank_bin - integer-like label with the applied rank threshold
(outside all thresholds -> NA).
An immundata::ImmunData whose $annotations gains:
clonal_prop_bin - label from names(bins) or "Ultra-rare".
Per-repertoire summaries: airr_clonality
Data container: immundata::ImmunData
## Not run: idata <- get_test_idata() |> agg_repertoires("Therapy") idata_rank <- annotate_clonality_rank(idata) idata_prop <- annotate_clonality_prop(idata) ## End(Not run)## Not run: idata <- get_test_idata() |> agg_repertoires("Therapy") idata_rank <- annotate_clonality_rank(idata) idata_prop <- annotate_clonality_prop(idata) ## End(Not run)
Apply the given function to every pair in the given datalist. Function either symmetrical (i.e. fun(x,y) == fun(y,x)) or assymmetrical (i.e. fun(x,y) != fun(y,x)).
apply_symm(.datalist, .fun, ..., .diag = NA, .verbose = TRUE) apply_asymm(.datalist, .fun, ..., .diag = NA, .verbose = TRUE)apply_symm(.datalist, .fun, ..., .diag = NA, .verbose = TRUE) apply_asymm(.datalist, .fun, ..., .diag = NA, .verbose = TRUE)
.datalist |
List with some data.frames. |
.fun |
Function to apply, which return basic class value. |
... |
Arguments passsed to .fun. |
.diag |
Either NA for NA or something else != NULL for .fun(x,x). |
.verbose |
if TRUE then output a progress bar. |
Matrix with values M(i,j) = fun(datalist(i), datalist(j))
data(immdata) apply_symm(immdata$data, function(x, y) { nrow(x) + nrow(y) })data(immdata) apply_symm(immdata$data, function(x, y) { nrow(x) + nrow(y) })
A dataset with BCR data for testing and examplatory purposes.
bcrdatabcrdata
A list of two elements. The first element ("data") is a list of 1 element named "full_clones" that contains immune repertoire data frame. The second element ("meta") is empty metadata table.
List of immune repertoire data frames.
Metadata
...
Nucleotide to amino acid sequence translation
bunch_translate(.seq, .two.way = TRUE, .ignore.n = FALSE)bunch_translate(.seq, .two.way = TRUE, .ignore.n = FALSE)
.seq |
Vector or list of strings. |
.two.way |
Logical. If TRUE (default) then translate from the both ends (like MIXCR). |
.ignore.n |
Logical. If FALSE (default) then return NA for sequences that have N, else parse triplets with N as ~ |
Character vector of translated input sequences.
data(immdata) head(bunch_translate(immdata$data[[1]]$CDR3.nt))data(immdata) head(bunch_translate(immdata$data[[1]]$CDR3.nt))
Check if the given .data is a distribution and normalise it if necessary with an optional Laplace correction.
check_distribution( .data, .do.norm = NA, .laplace = 1, .na.val = 0, .warn.zero = FALSE, .warn.sum = TRUE )check_distribution( .data, .do.norm = NA, .laplace = 1, .na.val = 0, .warn.zero = FALSE, .warn.sum = TRUE )
.data |
Numeric vector of values. |
.do.norm |
One of the three values - NA, TRUE or FALSE. If NA then checks for distrubution (sum(.data) == 1) and normalises if needed with the given laplace correction value. if TRUE then does the normalisation and laplace correction. If FALSE then doesn't do either normalisaton or laplace correction. |
.laplace |
Value for the laplace correction. |
.na.val |
Replace all NAs with this value. |
.warn.zero |
if TRUE then the function checks if in the resulted vector (after normalisation) are any zeros, and prints a warning message if there are some. |
.warn.sum |
if TRUE then the function checks if the sum of resulted vector (after normalisation) is equal to one, and prints a warning message if not. |
Numeric vector.
immunarch:::check_distribution(c(1, 2, 3)) immunarch:::check_distribution(c(1, 2, 3), TRUE) immunarch:::check_distribution(c(1, 2, 3), FALSE)
Filter out clonotypes with non-coding, coding, in-frame or out-of-frame CDR3 sequences:
coding() - remove all non-coding sequences (i.e., remove all sequences with stop codons and frame shifts);
noncoding() - remove all coding sequences (i.e., leave sequences with stop codons and frame shifts only);
inframes() - remove all out-of-frame sequences (i.e., remove all sequences with frame shifts);
outofframes() - remove all in-frame sequences (i.e., leave sequences with frame shifts only).
Note: the function will remove all clonotypes sequences with NAs in the CDR3 amino acid column.
coding(.data) noncoding(.data) inframes(.data) outofframes(.data)coding(.data) noncoding(.data) inframes(.data) outofframes(.data)
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, Apache Spark DataFrame from "copy_to" or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
Filtered data frame.
data(immdata) immdata_cod <- coding(immdata$data) immdata_cod1 <- coding(immdata$data[[1]])data(immdata) immdata_cod <- coding(immdata$data) immdata_cod1 <- coding(immdata$data[[1]])
Annotate clonotypes by matching them to known condition-associated immune receptors in a database. Before using this function, you must download or load the relevant database files. For more information, see the online tutorial.
dbAnnotate(.data, .db, .data.col, .db.col)dbAnnotate(.data, .db, .data.col, .db.col)
.data |
The data to process. It can be a data.frame, a data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.db |
A data frame or a data table with an immune receptor database. See dbLoad on how to load databases into R. |
.data.col |
Character vector. Vector of columns in the input repertoires to use for clonotype search. E.g., |
.db.col |
Character vector. Vector of columns in the database to use for clonotype search. The order must match the order of ".data.col".
E.g., if ".data.col" is |
Data frame with input sequences and counts or proportions for each of the input repertoire.
data(immdata) #' # Example file path file_path <- paste0(system.file(package = "immunarch"), "/extdata/db/vdjdb.example.txt") # Load the database with human-only TRB-only receptors for all known antigens db <- dbLoad(file_path, "vdjdb", "HomoSapiens", "TRB") res <- dbAnnotate(immdata$data, db, "CDR3.aa", "cdr3") resdata(immdata) #' # Example file path file_path <- paste0(system.file(package = "immunarch"), "/extdata/db/vdjdb.example.txt") # Load the database with human-only TRB-only receptors for all known antigens db <- dbLoad(file_path, "vdjdb", "HomoSapiens", "TRB") res <- dbAnnotate(immdata$data, db, "CDR3.aa", "cdr3") res
The function automatically detects the database format and loads it into R. Additionally, the function provides a general query interface to databases that allows filtering by species, chain types (i.e., locus) and pathology (i.e., antigen species).
Currently we support three popular databases:
VDJDB - https://github.com/antigenomics/vdjdb-db
McPAS-TCR - https://friedmanlab.weizmann.ac.il/McPAS-TCR/
TBAdb from PIRD - https://db.cngb.org/pird/
dbLoad(.path, .db, .species = NA, .chain = NA, .pathology = NA)dbLoad(.path, .db, .species = NA, .chain = NA, .pathology = NA)
.path |
Character. A path to the database file, e.g., "/Users/researcher/Downloads/McPAS-TCR.csv". |
.db |
Character. A database type: either "vdjdb", "vdjdb-search", "mcpas" or "tbadb". "vdjdb" for VDJDB; "vdjdb-search" for search table obtained from the web interface of VDJDB; "mcpas" for McPAS-TCR; "tbadb" for PIRD TBAdb. |
.species |
Character. A string or a vector of strings specifying which species need to be in the database, e.g., "HomoSapiens". Pass NA (by default) to load all available species. |
.chain |
Character. A string or a vector of strings specifying which chains need to be in the database, e.g., "TRB". Pass NA (by default) to load all available chains. |
.pathology |
Character. A string or a vector of strings specifying which disease, virus, bacteria or any condition needs to be in the database, e.g., "CMV". Pass NA (by default) to load all available conditions. |
Data frame with the input database records.
# Example file path file_path <- paste0(system.file(package = "immunarch"), "/extdata/db/vdjdb.example.txt") # Load the database with human-only TRB-only receptors for all known antigens db <- dbLoad(file_path, "vdjdb", "HomoSapiens", "TRB") db# Example file path file_path <- paste0(system.file(package = "immunarch"), "/extdata/db/vdjdb.example.txt") # Load the database with human-only TRB-only receptors for all known antigens db <- dbLoad(file_path, "vdjdb", "HomoSapiens", "TRB") db
Compute information-based estimates and distances.
entropy(.data, .base = 2, .norm = FALSE, .do.norm = NA, .laplace = 1e-12) kl_div(.alpha, .beta, .base = 2, .do.norm = NA, .laplace = 1e-12) js_div(.alpha, .beta, .base = 2, .do.norm = NA, .laplace = 1e-12, .norm.entropy = FALSE) cross_entropy(.alpha, .beta, .base = 2, .do.norm = NA, .laplace = 1e-12, .norm.entropy = FALSE)entropy(.data, .base = 2, .norm = FALSE, .do.norm = NA, .laplace = 1e-12) kl_div(.alpha, .beta, .base = 2, .do.norm = NA, .laplace = 1e-12) js_div(.alpha, .beta, .base = 2, .do.norm = NA, .laplace = 1e-12, .norm.entropy = FALSE) cross_entropy(.alpha, .beta, .base = 2, .do.norm = NA, .laplace = 1e-12, .norm.entropy = FALSE)
.data |
Numeric vector. Any distribution. |
.base |
Numeric. A base of logarithm. |
.norm |
Logical. If TRUE then normalises the entropy by the maximal value of the entropy. |
.do.norm |
If TRUE then normalises the input distributions to make them sum up to 1. |
.laplace |
Numeric. A value for the laplace correction. |
.alpha |
Numeric vector. A distribution of some random value. |
.beta |
Numeric vector. A distribution of some random value. |
.norm.entropy |
Logical. If TRUE then normalises the resulting value by the average entropy of input distributions. |
A numeric value.
P <- abs(rnorm(10)) Q <- abs(rnorm(10)) entropy(P) kl_div(P, Q) js_div(P, Q) cross_entropy(P, Q)P <- abs(rnorm(10)) Q <- abs(rnorm(10)) entropy(P) kl_div(P, Q) js_div(P, Q) cross_entropy(P, Q)
The fixVis is a built-in software tool for the manipulation
of plots, such as adjusting title text font and size, axes, and more. It is a powerful
tool designed to produce publication-ready plots with minimal amount of coding.
fixVis(.plot = NA)fixVis(.plot = NA)
.plot |
A ggplot2 plot. |
No return value because it is an application.
if (interactive()) { # Compute gene usage, visualise it and tweak via fixVis data(immdata) # load test data gu <- geneUsage(immdata$data) p <- vis(gu) fixVis(p) }if (interactive()) { # Compute gene usage, visualise it and tweak via fixVis data(immdata) # load test data gu <- geneUsage(immdata$data) p <- vis(gu) fixVis(p) }
WIP
gene_stats()gene_stats()
gene_stats returns all segment gene statistics
gene_stats() get_genes("hs.trbv", "segment")gene_stats() get_genes("hs.trbv", "segment")
An utility function to analyse the immune receptor gene usage
(IGHD, IGHJ, IDHV, IGIJ, IGKJ, IGKV, IGLJ, IGLV, TRAJ, TRAV, TRBD, etc.)
and statistics. For gene details run gene_stats().
geneUsage( .data, .gene = c("hs.trbv", "HomoSapiens.TRBJ", "macmul.IGHV"), .quant = c(NA, "count"), .ambig = c("inc", "exc", "maj"), .type = c("segment", "allele", "family"), .norm = FALSE )geneUsage( .data, .gene = c("hs.trbv", "HomoSapiens.TRBJ", "macmul.IGHV"), .quant = c(NA, "count"), .ambig = c("inc", "exc", "maj"), .type = c("segment", "allele", "family"), .norm = FALSE )
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections,or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.gene |
A character vector of length one with the name of the gene you want
to analyse of the specific species. If you provide a vector of different length, only the first element
will be used. The string should also contain the species of interest, for example, valid ".gene" arguments
are "hs.trbv", "HomoSapiens.TRBJ" or "macmul.IGHV". For details run |
.quant |
Selects the column with data to evaluate. Pass NA if you want to compute gene statistics at the clonotype level without re-weighting. Pass "count" to use the "Clones" column to weight genes by abundance of their corresponding clonotypes. |
.ambig |
An option to handle ambiguous gene assigments, e.g., "TRAV1,TRAV2".
We recommend to turn it on by passing "inc" (turned on by default). You can exclude data for the cases where there is no clear match for gene, include it for every supplied gene, or pick only first from the set. Set it to "exc", "inc" or "maj", respectively. |
.type |
Set the type of data to evaluate: "segment", "allele", or "family". |
.norm |
If TRUE then return proportions of genes. If FALSE then return counts of genes. |
A data frame with rows corresponding to gene segments and columns corresponding to the input samples.
data(immdata) gu <- geneUsage(immdata$data) vis(gu)data(immdata) gu <- geneUsage(immdata$data) vis(gu)
The geneUsageAnalysis() function deploys several
data analysis methods, including PCA, multidimensional scaling,
Jensen-Shannon divergence, k-means, hierarchical clustering, DBscan, and different
correlation coefficients.
geneUsageAnalysis( .data, .method = c("js+hclust", "pca+kmeans", "anova", "js+pca+kmeans"), .base = 2, .norm.entropy = FALSE, .cor = c("pearson", "kendall", "spearman"), .do.norm = TRUE, .laplace = 1e-12, .verbose = TRUE, .k = 2, .eps = 0.01, .perp = 1, .theta = 0.1 )geneUsageAnalysis( .data, .method = c("js+hclust", "pca+kmeans", "anova", "js+pca+kmeans"), .base = 2, .norm.entropy = FALSE, .cor = c("pearson", "kendall", "spearman"), .do.norm = TRUE, .laplace = 1e-12, .verbose = TRUE, .k = 2, .eps = 0.01, .perp = 1, .theta = 0.1 )
.data |
The |
.method |
A string that defines the type of analysis to perform. Can be "pca",
"mds", "js", "kmeans", "hclust", "dbscan" or "cor" if you want to calculate
correlation coefficient. In the latter case you have to provide |
.base |
A numerical value that defines the logarithm base for Jensen-Shannon divergence. |
.norm.entropy |
A logical value. Set TRUE to normalise your data if you haven't done it already. |
.cor |
A string that defines the correlation coefficient for analysis. Can be "pearson", "kendall" or "spearman". |
.do.norm |
A logical value. If TRUE it forces Laplace smoothing, if NA it checks if smoothing is necessary, if FALSE does nothing. |
.laplace |
The numeric value, which is used as a pseudocount for Laplace smoothing. |
.verbose |
A logical value. |
.k |
The number of clusters to create, passed as |
.eps |
A numerical value, DBscan epsylon parameter, see
|
.perp |
A numerical value, t-SNE perplexity, see |
.theta |
A numerical value, t-SNE theta parameter, see |
Depends on the last element in the .method string. See immunr_tsne for more info.
data(immdata) gu <- geneUsage(immdata$data, .norm = TRUE) geneUsageAnalysis(gu, "js+hclust", .verbose = FALSE) %>% vis()data(immdata) gu <- geneUsage(immdata$data, .norm = TRUE) geneUsageAnalysis(gu, "js+hclust", .verbose = FALSE) %>% vis()
Retrieves an update message for immunarch.
get_immunarch_news(datepoint = "latest")get_immunarch_news(datepoint = "latest")
datepoint |
A string specifying the update date. Use |
If datepoint is set to "latest", the function returns the most recent update.
Otherwise, specify the update date key (e.g., "Apr 2025") to retrieve that particular update.
If no matching update is found, a warning is issued along with available update keys.
A character string with the update details or a warning if the key is not found.
getKmers(.data, .k, .col = c("aa", "nt"), .coding = TRUE)getKmers(.data, .k, .col = c("aa", "nt"), .coding = TRUE)
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections,or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.k |
Integer. Length of k-mers. |
.col |
Character. Which column to use, pass "aa" (by default) for CDR3 amino acid sequence, pass "nt" for CDR3 nucleotide sequences. |
.coding |
Logical. If TRUE (by default) then removes all non-coding sequences from input data first. |
Data frame with two columns (k-mers and their counts).
data(immdata) kmers <- getKmers(immdata$data[[1]], 5) kmers %>% vis()data(immdata) kmers <- getKmers(immdata$data[[1]], 5) kmers %>% vis()
Get a character vector of samples' groups from the input metadata file
group_from_metadata(.by, .metadata, .sep = "; ")group_from_metadata(.by, .metadata, .sep = "; ")
.by |
Character vector. Specify a column or columns in the input metadata to group by. |
.metadata |
Metadata object. |
.sep |
Character vector. Defines a separator between groups if more than one group passed in |
Character vector with group names.
immunarch:::group_from_metadata("Status", data.frame(Status = c("A", "A", "B", "B", "C")))
A function to check if an input object has a specific class name.
has_class(.data, .class)has_class(.data, .class)
.data |
Any R object. |
.class |
Character vector. Specifies a class name to check against. |
Logical value.
tmp <- "abc" immunarch:::has_class(tmp, "new_class") tmp <- immunarch:::add_class(tmp, "new_class") immunarch:::has_class(tmp, "new_class")
A dataset with single chain TCR data for testing and examplatory purposes.
immdataimmdata
A list of two elements. The first element ("data") is a list with data frames with clonotype tables. The second element ("meta") is a metadata table.
List of immune repertoire data frames.
Metadata
...
Get a list of package updates
immunarch_v1_updatesimmunarch_v1_updates
An object of class list of length 1.
"Clones" - number of barcodes (events, UMIs) or reads;
"Proportion" - proportion of barcodes (events, UMIs) or reads;
"CDR3.nt" - CDR3 nucleotide sequence;
"CDR3.aa" - CDR3 amino acid sequence;
"V.name" - names of aligned Variable gene segments;
"D.name" - names of aligned Diversity gene segments or NA;
"J.name" - names of aligned Joining gene segments;
"V.end" - last positions of aligned V gene segments (1-based);
"D.start" - positions of D'5 end of aligned D gene segments (1-based);
"D.end" - positions of D'3 end of aligned D gene segments (1-based);
"J.start" - first positions of aligned J gene segments (1-based);
"VJ.ins" - number of inserted nucleotides (N-nucleotides) at V-J junction (-1 for receptors with VDJ recombination);
"VD.ins" - number of inserted nucleotides (N-nucleotides) at V-D junction (-1 for receptors with VJ recombination);
"DJ.ins" - number of inserted nucleotides (N-nucleotides) at D-J junction (-1 for receptors with VJ recombination);
"Sequence" - full nucleotide sequence.
Clusters the data with one of the following methods:
immunr_hclust clusters the data using the hierarchical clustering from hcut;
immunr_kmeans clusters the data using the K-means algorithm from kmeans;
immunr_dbscan clusters the data using the DBSCAN algorithm from dbscan.
immunr_hclust(.data, .k = 2, .k.max = nrow(.data) - 1, .method = "complete", .dist = TRUE) immunr_kmeans(.data, .k = 2, .k.max = as.integer(sqrt(nrow(.data))) + 1, .method = c("silhouette", "gap_stat")) immunr_dbscan(.data, .eps, .dist = TRUE)immunr_hclust(.data, .k = 2, .k.max = nrow(.data) - 1, .method = "complete", .dist = TRUE) immunr_kmeans(.data, .k = 2, .k.max = as.integer(sqrt(nrow(.data))) + 1, .method = c("silhouette", "gap_stat")) immunr_dbscan(.data, .eps, .dist = TRUE)
.data |
Matrix or data frame with features, distance matrix or output from repOverlapAnalysis or geneUsageAnalysis functions. |
.k |
The number of clusters to create, defined as |
.k.max |
Limits the maximum number of clusters. It is passed as |
.method |
Passed to factoextra::hcut or as factoextra::fviz_nbclust. In case of factoextra::hcut the agglomeration method is going to be used (argument In case of factoextra::fviz_nbclust it is the method to be used for estimating the optimal number of clusters (argument |
.dist |
If TRUE then ".data" is expected to be a distance matrix. If FALSE then the euclidean distance is computed for the input objects. |
.eps |
Local radius for expanding clusters, minimal distance between points to expand clusters. Passed as |
immunr_hclust - list with two elements. The first element is an output from factoextra::hcut.
The second element is an output from factoextra::fviz_nbclust
immunr_kmeans - list with three elements. The first element is an output from kmeans.
The second element is an output from factoextra::fviz_nbclust.
The third element is the input dataset .data.
immunr_dbscan - list with two elements. The first element is an output from fpc::dbscan.
The second element is the input dataset .data.
data(immdata) gu <- geneUsage(immdata$data, .norm = TRUE) immunr_hclust(t(as.matrix(gu[, -1])), .dist = FALSE) gu[is.na(gu)] <- 0 immunr_kmeans(t(as.matrix(gu[, -1])))data(immdata) gu <- geneUsage(immdata$data, .norm = TRUE) immunr_hclust(t(as.matrix(gu[, -1])), .dist = FALSE) gu[is.na(gu)] <- 0 immunr_kmeans(t(as.matrix(gu[, -1])))
Collects a set of principal variables, reducing the number of not important variables to analyse. Dimensionality reduction makes data analysis algorithms work faster and sometimes more accurate, since it also reduces noise in the data. Currently available methods are:
immunr_pca performs PCA (Principal Component Analysis) using prcomp;
immunr_mds performs MDS (Multi-Dimensional Scaling) using isoMODS from MASS package.
immunr_tsne performs tSNE (t-Distributed Stochastic Neighbour Embedding) using Rtsne Rtsne package.
immunr_pca(.data, .scale = default_scale_fun, .raw = TRUE, .orig = FALSE, .dist = FALSE) immunr_mds(.data, .scale = default_scale_fun, .raw = TRUE, .orig = FALSE, .dist = TRUE) immunr_tsne(.data, .perp = 1, .dist = TRUE, ...)immunr_pca(.data, .scale = default_scale_fun, .raw = TRUE, .orig = FALSE, .dist = FALSE) immunr_mds(.data, .scale = default_scale_fun, .raw = TRUE, .orig = FALSE, .dist = TRUE) immunr_tsne(.data, .perp = 1, .dist = TRUE, ...)
.data |
A matrix or a data frame with features, distance matrix or output from repOverlapAnalysis or geneUsageAnalysis functions. |
.scale |
A function to apply to your data before passing it to any of dimensionality reduction algorithms. There is no scaling by default. |
.raw |
If TRUE then returns the non-processed output from dimensionality reduction algorithms. Pass FALSE if you want to visualise results. |
.orig |
If TRUE then returns the original result from algorithms. Pass FALSE if you want to visualise results. |
.dist |
If TRUE then assumes that ".data" is a distance matrix. |
.perp |
The perplexity parameter for Rtsne. Specifies the number of neighbors each data point must have in the resulting plot. |
... |
Other parameters passed to Rtsne. |
immunr_pca - an output from prcomp.
immunr_mds - an output from isoMDS.
immunr_tsne - an output from Rtsne.
vis.immunr_pca for visualisations.
data(immdata) gu <- geneUsage(immdata$data) gu[is.na(gu)] <- 0 gu <- t(as.matrix(gu[, -1])) immunr_pca(gu) immunr_mds(dist(gu)) immunr_tsne(dist(gu))data(immdata) gu <- geneUsage(immdata$data) gu[is.na(gu)] <- 0 gu <- t(as.matrix(gu[, -1])) immunr_pca(gu) immunr_mds(dist(gu)) immunr_tsne(dist(gu))
For reference please look up https://www.pnas.org/content/111/16/5980 (Fig. 4).
inc_overlap( .data, .fun, .step = 1000, .n.steps = 10, .downsample = FALSE, .bootstrap = NA, .verbose.inc = TRUE, ... )inc_overlap( .data, .fun, .step = 1000, .n.steps = 10, .downsample = FALSE, .bootstrap = NA, .verbose.inc = TRUE, ... )
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.fun |
Function to compute overlaps. e.g., |
.step |
Either an integer or a numeric vector. In the first case, the integer defines the step of incremental overlap. In the second case, the vector encodes all repertoire sampling depths. |
.n.steps |
Integer. Number of steps if |
.downsample |
If TRUE then performs downsampling to N clonotypes at each step instead of choosing the top N clonotypes. |
.bootstrap |
Set NA to turn off any bootstrapping, set a number to perform bootstrapping with this number of tries. |
.verbose.inc |
Logical. If TRUE then shows the output from the computation process. |
... |
Other arguments passed to |
List with overlap matrices.
## Not run: data(immdata) ov <- repOverlap(immdata$data, "inc+overlap", .step = 100, .verbose.inc = FALSE, .verbose = FALSE) vis(ov) ## End(Not run)## Not run: data(immdata) ov <- repOverlap(immdata$data, "inc+overlap", .step = 100, .verbose.inc = FALSE, .verbose = FALSE) vis(ov) ## End(Not run)
Returns the list of available update keys for immunarch v1.
list_immunarch_news()list_immunarch_news()
A character vector containing all the date keys for the available updates.
Copy the upper matrix triangle to the lower one
matrixdiagcopy(.mat)matrixdiagcopy(.mat)
.mat |
Matrix. |
Matrix with its upper tri part copied to the lower tri part.
mat <- matrix(0, 3, 3) mat mat(1, 3) <- 1 mat <- immunarch:::matrixdiagcopy(mat) mat
public_matrix(.data)public_matrix(.data)
.data |
Public repertoire, an output from pubRep. |
Matrix with per-sample clonotype counts / proportions only.
data(immdata) immdata$data <- lapply(immdata$data, head, 2000) pr <- pubRep(immdata$data, .verbose = FALSE) pr.mat <- public_matrix(pr) dim(pr.mat) head(pr.mat)data(immdata) immdata$data <- lapply(immdata$data, head, 2000) pr <- pubRep(immdata$data, .verbose = FALSE) pr.mat <- public_matrix(pr) dim(pr.mat) head(pr.mat)
pubRep( .data, .col = "aa+v", .quant = c("count", "prop"), .coding = TRUE, .min.samples = 1, .max.samples = NA, .verbose = TRUE )pubRep( .data, .col = "aa+v", .quant = c("count", "prop"), .coding = TRUE, .min.samples = 1, .max.samples = NA, .verbose = TRUE )
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.col |
A string that specifies the column(s) to be processed. Outputs one of the following strings, separated by the plus sign: "nt" for nucleotide sequences, "aa" for amino acid sequences, "v" for V gene segments, "j" for J gene segments. E.g., pass "aa+v" to compute overlaps on CDR3 amino acid sequences paired with V gene segments, i.e., in this case a unique clonotype is a pair of CDR3 amino acid and V gene segment. |
.quant |
A string that specifies the column to be processed. Set "count" to see public clonotype sharing with the number of clones, set "prop" to see proportions. |
.coding |
Logical. If TRUE then preprocesses the data to filter out non-coding sequences. |
.min.samples |
Integer. A minimal number of samples a clonotype must have to be included in the public repertoire table. |
.max.samples |
Integer. A maxminal number of samples a clonotype must have to be included in the public repertoire table. Set NA (by default) to have the maximal amount of samples. |
.verbose |
Logical. If TRUE then outputs the progress. |
Data table with columns for:
Clonotypes (e.g., CDR3 sequence, or two columns for CDR3 sequence and V gene)
Incidence of clonotypes
Per-sample proportions or counts
# Subset the data to make the example faster to run immdata$data <- lapply(immdata$data, head, 2000) pr <- pubRep(immdata$data, .verbose = FALSE) vis(pr, "clonotypes", 1, 2)# Subset the data to make the example faster to run immdata$data <- lapply(immdata$data, head, 2000) pr <- pubRep(immdata$data, .verbose = FALSE) vis(pr, "clonotypes", 1, 2)
pubRepApply(.pr1, .pr2, .fun = function(x) log10(x[1])/log10(x[2]))pubRepApply(.pr1, .pr2, .fun = function(x) log10(x[1])/log10(x[2]))
.pr1 |
First public repertoire. |
.pr2 |
Second public repertoire. |
.fun |
A function to apply to pairs of frequencies of same clonotypes from "pr1" and "pr2".
By default - |
Work in progress.
data(immdata) immdata$data <- lapply(immdata$data, head, 2000) pr <- pubRep(immdata$data, .verbose = FALSE) pr1 <- pubRepFilter(pr, immdata$meta, .by = c(Status = "MS")) pr2 <- pubRepFilter(pr, immdata$meta, .by = c(Status = "C")) prapp <- pubRepApply(pr1, pr2) head(prapp)data(immdata) immdata$data <- lapply(immdata$data, head, 2000) pr <- pubRep(immdata$data, .verbose = FALSE) pr1 <- pubRepFilter(pr, immdata$meta, .by = c(Status = "MS")) pr2 <- pubRepFilter(pr, immdata$meta, .by = c(Status = "C")) prapp <- pubRepApply(pr1, pr2) head(prapp)
Filter our clonotypes with low incidence in a specific group.
pubRepFilter(.pr, .meta, .by, .min.samples = 1)pubRepFilter(.pr, .meta, .by, .min.samples = 1)
.pr |
Public repertoires, an output from pubRep. |
.meta |
Metadata file. |
.by |
Named character vector. Names of the group to filter by. |
.min.samples |
Integer. Filters out clonotypes with the number of samples below than this number. |
Data frame with filtered clonotypes.
data(immdata) immdata$data <- lapply(immdata$data, head, 2000) pr <- pubRep(immdata$data, .verbose = FALSE) pr1 <- pubRepFilter(pr, immdata$meta, .by = c(Status = "MS")) head(pr1)data(immdata) immdata$data <- lapply(immdata$data, head, 2000) pr <- pubRep(immdata$data, .verbose = FALSE) pr1 <- pubRepFilter(pr, immdata$meta, .by = c(Status = "MS")) head(pr1)
pubRepStatistics(.data, .by = NA, .meta = NA)pubRepStatistics(.data, .by = NA, .meta = NA)
.data |
Public repertoire, an output from the pubRep function. |
.by |
Work in Progress. |
.meta |
Work in Progress. |
Data frame with incidence statistics per sample.
data(immdata) immdata$data <- lapply(immdata$data, head, 2000) pr <- pubRep(immdata$data, .verbose = FALSE) pubRepStatistics(pr) %>% vis()data(immdata) immdata$data <- lapply(immdata$data, head, 2000) pr <- pubRep(immdata$data, .verbose = FALSE) pubRepStatistics(pr) %>% vis()
This function aligns all sequences (incliding germline) that belong to one clonal lineage and one cluster. After clustering and building the clonal lineage and germline, the next step is to analyze the degree of mutation and maturity of each clonal lineage. This allows for finding high mature cells and cells with a large number of offspring. The phylogenetic analysis will find mutations that increase the affinity of BCR. Making alignment of the sequence is the first step towards sequence analysis including BCR.
repAlignLineage(.data, .min_lineage_sequences, .prepare_threads, .align_threads, .nofail)repAlignLineage(.data, .min_lineage_sequences, .prepare_threads, .align_threads, .nofail)
.data |
The data to be processed. Can be data.frame, data.table::data.table or a list of these objects. |
.min_lineage_sequences |
If number of sequences in the same clonal lineage and the same cluster (not including germline) is lower than this threshold, this group of sequences will be filtered out from the dataframe; so only large enough lineages will be included. |
.prepare_threads |
Number of threads to prepare results table. Please note that high number can cause heavy memory usage! |
.align_threads |
Number of threads for lineage alignment. It must have columns in the immunarch compatible format immunarch_data_format, and also must contain 'Cluster' column, which is added by seqCluster() function, and 'Germline.sequence' column, which is added by repGermline() function. |
.nofail |
Will return NA instead of stopping if Clustal W is not installed. Used to avoid raising errors in examples on computers where Clustal W is not installed. |
Dataframe or list of dataframes (if input is a list with multiple samples). The dataframe has these columns:
Cluster: cluster name
Germline: germline sequence
Alignment: DNAbin object with alignment
Sequences: nested dataframe containing all sequences for this combination of cluster and germline; it has columns
Sequence, CDR1.nt, CDR2.nt, CDR3.nt, FR1.nt, FR2.nt, FR3.nt, FR4.nt, V.allele, J.allele, V.aa, J.aa: all values taken from the input dataframe
Clone.ID: taken from the input dataframe, or created (filled with row numbers) if missing
Clones: taken from the input dataframe, or created (filled with '1' values) if missing
data(bcrdata) bcr_data <- bcrdata$data bcr_data %>% seqCluster(seqDist(bcr_data), .fixed_threshold = 3) %>% repGermline(.threads = 1) %>% repAlignLineage(.min_lineage_sequences = 2, .align_threads = 2, .nofail = TRUE)data(bcrdata) bcr_data <- bcrdata$data bcr_data %>% seqCluster(seqDist(bcr_data), .fixed_threshold = 3) %>% repGermline(.threads = 1) %>% repAlignLineage(.min_lineage_sequences = 2, .align_threads = 2, .nofail = TRUE)
This function uses the PHYLIP package to make phylogenetic analysis. For making trees it uses maximum parsimony methods.
repClonalFamily(.data, .vis_groups, .threads, .nofail)repClonalFamily(.data, .vis_groups, .threads, .nofail)
.data |
The data to be processed, output of repAlignLineage() function. |
.vis_groups |
Groups for visualization, used to annotate specific clones on chart and display them in different colors. This is a named list, where names are for the chart legend, and list items are clone IDs that belong to the groups. It's not necessary to assign groups to all clonotypes; unassigned ones will be displayed on the chart as "Clonotype" category. It's also possible to assign multiple clonotypes to the same group by providing nested lists or vectors of clone IDs instead of single clone IDs. Example: .vis_groups = list(A = 817, B = 201, C = list(303, 42)) |
.threads |
Number of threads to use. |
.nofail |
Returns NA instead of stopping if PHYLIP is not installed. Used to avoid raising errors in examples on computers where PHYLIP is not installed. |
Dataframe or list of dataframes (if input is a list with multiple samples). The dataframe has these columns:
Cluster: cluster name
Germline.Input: germline sequence, like it was in the input; not aligned
Germline.Output: germline sequence, parsed from PHYLIP dnapars function output; it contains difference of germline from the common ancestor; "." characters mean matching letters
Common.Ancestor: common ancestor sequence, parsed from PHYLIP dnapars function output
Trunk.Length: mean trunk length, representing the distance between the most recent common ancestor and germline sequence as a measure of the maturity of a lineage
Tree: output tree in "phylo" format, loaded from by PHYLIP dnapars function output
TreeStats: nested dataframe containing data about tree nodes, needed for visualization
Sequences: nested dataframe containing all sequences for this combination of cluster and germline; it contains regions from original sequences, saved for repSomaticHypermutation() calculation, and also data needed for visualizations
data(bcrdata) bcr_data <- bcrdata$data bcr_data %>% seqCluster(seqDist(bcr_data), .fixed_threshold = 3) %>% repGermline(.threads = 1) %>% repAlignLineage(.min_lineage_sequences = 2, .align_threads = 2, .nofail = TRUE) %>% repClonalFamily(.threads = 1, .nofail = TRUE)data(bcrdata) bcr_data <- bcrdata$data bcr_data %>% seqCluster(seqDist(bcr_data), .fixed_threshold = 3) %>% repGermline(.threads = 1) %>% repAlignLineage(.min_lineage_sequences = 2, .align_threads = 2, .nofail = TRUE) %>% repClonalFamily(.threads = 1, .nofail = TRUE)
repClonality function encompasses several methods to measure
clonal proportions in a given repertoire.
repClonality( .data, .method = c("clonal.prop", "homeo", "top", "rare"), .perc = 10, .clone.types = c(Rare = 1e-05, Small = 1e-04, Medium = 0.001, Large = 0.01, Hyperexpanded = 1), .head = c(10, 100, 1000, 3000, 10000, 30000, 1e+05), .bound = c(1, 3, 10, 30, 100) )repClonality( .data, .method = c("clonal.prop", "homeo", "top", "rare"), .perc = 10, .clone.types = c(Rare = 1e-05, Small = 1e-04, Medium = 0.001, Large = 0.01, Hyperexpanded = 1), .head = c(10, 100, 1000, 3000, 10000, 30000, 1e+05), .bound = c(1, 3, 10, 30, 100) )
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.method |
A String with one of the following options: Set Set Set Set |
.perc |
A single numerical value ranging from 0 to 100. |
.clone.types |
A named numerical vector with the threshold of the half-closed intervals that mark off clonal groups. |
.head |
A numerical vector with ranges of the top clonotypes. |
.bound |
A numerical vector with ranges of abundance for the rare clonotypes in the dataset. |
Clonal proportion assessment is a different approach to estimate repertoire diversity. When visualised, it allows for thorough examination of immune repertoire structure and composition.
In its core this type of analysis is similar to the relative species abundance concept in ecology. Relative abundance is the percent composition of an organism of a particular kind relative to the total number of organisms in the area.
A stacked barplot of relative clonotype abundances can be therefore viewed as a non-parametric approach to comparing their underlying distributions.
If input data is a single immune repertoire, then the function returns a numeric vector with clonality statistics.
Otherwise, it returns a numeric matrix with clonality statistics for all input repertoires.
# Load the data data(immdata) imm_pr <- repClonality(immdata$data, .method = "clonal.prop") vis(imm_pr) imm_top <- repClonality(immdata$data, .method = "top", .head = c(10, 100, 1000, 3000, 10000)) vis(imm_top) imm_rare <- repClonality(immdata$data, .method = "rare") vis(imm_rare) imm_hom <- repClonality(immdata$data, .method = "homeo") vis(imm_hom)# Load the data data(immdata) imm_pr <- repClonality(immdata$data, .method = "clonal.prop") vis(imm_pr) imm_top <- repClonality(immdata$data, .method = "top", .head = c(10, 100, 1000, 3000, 10000)) vis(imm_top) imm_rare <- repClonality(immdata$data, .method = "rare") vis(imm_rare) imm_hom <- repClonality(immdata$data, .method = "homeo") vis(imm_hom)
This is a utility function to estimate the diversity of species or objects in the given distribution.
Note: functions will check if .data is a distribution of a random variable (sum == 1) or not. To force normalisation and / or to prevent this, set .do.norm to TRUE (do normalisation) or FALSE (don't do normalisation), respectively.
repDiversity( .data, .method = "chao1", .col = "aa", .max.q = 6, .min.q = 1, .q = 5, .step = NA, .quantile = c(0.025, 0.975), .extrapolation = NA, .perc = 50, .norm = TRUE, .verbose = TRUE, .do.norm = NA, .laplace = 0 )repDiversity( .data, .method = "chao1", .col = "aa", .max.q = 6, .min.q = 1, .q = 5, .step = NA, .quantile = c(0.025, 0.975), .extrapolation = NA, .perc = 50, .norm = TRUE, .verbose = TRUE, .do.norm = NA, .laplace = 0 )
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.method |
Picks a method used for estimation out of a following list: chao1, hill, div, gini.simp, inv.simp, gini, raref, d50, dxx. |
.col |
A string that specifies the column(s) to be processed. Pass one of the following strings, separated by the plus sign: "nt" for nucleotide sequences, "aa" for amino acid sequences, "v" for V gene segments, "j" for J gene segments. E.g., pass "aa+v" to compute diversity estimations on CDR3 amino acid sequences paired with V gene segments, i.e., in this case a unique clonotype is a pair of CDR3 amino acid and V gene segment. Clonal counts of equal clonotypes will be summed up. |
.max.q |
The max hill number to calculate (default: 5). |
.min.q |
Function calculates several hill numbers. Set the min (default: 1). |
.q |
q-parameter for the Diversity index. |
.step |
Rarefaction step's size. |
.quantile |
Numeric vector with quantiles for confidence intervals. |
.extrapolation |
An integer. An upper limit for the number of clones to extrapolate to. Pass 0 (zero) to turn extrapolation subroutines off. |
.perc |
Set the percent to dXX index measurement. |
.norm |
Normalises rarefaction curves. |
.verbose |
If TRUE then outputs progress. |
.do.norm |
One of the three values - NA, TRUE or FALSE. If NA then checks for distrubution (sum(.data) == 1) and normalises if needed with the given laplace correction value. if TRUE then does normalisation and laplace correction. If FALSE then doesn't do neither normalisaton nor laplace correction. |
.laplace |
A numeric value, which is used as a pseudocount for Laplace smoothing. |
True diversity, or the effective number of types, refers to the number of equally-abundant types needed for the average proportional abundance of the types to equal that observed in the dataset of interest where all types may not be equally abundant.
Inverse Simpson index is the effective number of types that is obtained when the weighted arithmetic mean is used to quantify average proportional abundance of types in the dataset of interest.
The Gini coefficient measures the inequality among values of a frequency distribution (for example levels of income). A Gini coefficient of zero expresses perfect equality, where all values are the same (for example, where everyone has the same income). A Gini coefficient of one (or 100 percents ) expresses maximal inequality among values (for example where only one person has all the income).
The Gini-Simpson index is the probability of interspecific encounter, i.e., probability that two entities represent different types.
Chao1 estimator is a nonparameteric asymptotic estimator of species richness (number of species in a population).
Rarefaction is a technique to assess species richness from the results of sampling through extrapolation.
Hill numbers are a mathematically unified family of diversity indices (differing among themselves only by an exponent q).
d50 is a recently developed immune diversity estimate. It calculates the minimum number of distinct clonotypes amounting to greater than or equal to 50 percent of a total of sequencing reads obtained following amplification and sequencing
dXX is a similar to d50 index where XX corresponds to desirable percent of total sequencing reads.
div, gini, gini.simp, inv.simp, raref return numeric vector of length 1 with value.
chao1 returns 4 values: estimated number of species, standart deviation of this number and two 95% confidence intervals for the species number.
hill returns a vector of specified length .max.q - .min.q
For most methods, if input data is a single immune repertoire, then the function returns a numeric vector with diversity statistics.
Otherwise, it returns a numeric matrix with diversity statistics for all input repertoires.
For Chao1 the function returns a matrix with diversity estimations.
For rarefaction the function returns either a matrix with diversity estimatinos on different step of the simulaiton process or a list with such matrices.
repOverlap, entropy, repClonality Rarefaction wiki https://en.wikipedia.org/wiki/Rarefaction_(ecology) Hill numbers paper https://www.uvm.edu/~ngotelli/manuscriptpdfs/ChaoHill.pdf Diversity wiki https://en.wikipedia.org/wiki/Measurement_of_biodiversity
data(immdata) # Make data smaller for testing purposes immdata$data <- top(immdata$data, 4000) # chao1 repDiversity(.data = immdata$data, .method = "chao1") %>% vis() # Hill numbers repDiversity( .data = immdata$data, .method = "hill", .max.q = 6, .min.q = 1, .do.norm = NA, .laplace = 0 ) %>% vis() # diversity repDiversity(.data = immdata$data, .method = "div", .q = 5, .do.norm = NA, .laplace = 0) %>% vis() # Gini-Simpson repDiversity(.data = immdata$data, .method = "gini.simp", .q = 5, .do.norm = NA, .laplace = 0) %>% vis() # inverse Simpson repDiversity(.data = immdata$data, .method = "inv.simp", .do.norm = NA, .laplace = 0) %>% vis() # Gini coefficient repDiversity(.data = immdata$data, .method = "gini", .do.norm = NA, .laplace = 0) # d50 repDiversity(.data = immdata$data, .method = "d50") %>% vis()data(immdata) # Make data smaller for testing purposes immdata$data <- top(immdata$data, 4000) # chao1 repDiversity(.data = immdata$data, .method = "chao1") %>% vis() # Hill numbers repDiversity( .data = immdata$data, .method = "hill", .max.q = 6, .min.q = 1, .do.norm = NA, .laplace = 0 ) %>% vis() # diversity repDiversity(.data = immdata$data, .method = "div", .q = 5, .do.norm = NA, .laplace = 0) %>% vis() # Gini-Simpson repDiversity(.data = immdata$data, .method = "gini.simp", .q = 5, .do.norm = NA, .laplace = 0) %>% vis() # inverse Simpson repDiversity(.data = immdata$data, .method = "inv.simp", .do.norm = NA, .laplace = 0) %>% vis() # Gini coefficient repDiversity(.data = immdata$data, .method = "gini", .do.norm = NA, .laplace = 0) # d50 repDiversity(.data = immdata$data, .method = "d50") %>% vis()
The repExplore function calculates the basic statistics of
repertoire: the number of unique immune receptor clonotypes, their relative abundances,
and sequence length distribution across the input dataset.
repExplore( .data, .method = c("volume", "count", "len", "clones"), .col = c("nt", "aa"), .coding = TRUE )repExplore( .data, .method = c("volume", "count", "len", "clones"), .col = c("nt", "aa"), .coding = TRUE )
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.method |
A string that specifies the method of analysis. It can be either "volume", "count", "len" or "clones". When .method is set to "volume" the repExplore calculates the number of unique clonotypes in the input data. When .method is set to "count" the repExplore calculates the distribution of clonotype abundances, i.e., how frequent receptors with different abundances are. When .method is set to "len" the repExplore calculates the distribution of CDR3 sequence lengths. When .method is set to "clones" the repExplore returns the number of clones (i.e., cells) per input repertoire. |
.col |
A string that specifies the column to be processed. Pass "nt" for nucleotide sequence or "aa" for amino acid sequence. |
.coding |
If |
If input data is a single immune repertoire, then the function returns a numeric vector with exploratory analysis statistics.
Otherwise, it returns a numeric matrix with exploratory analysis statistics for all input repertoires.
data(immdata) # Calculate statistics and generate a visual output with vis() repExplore(immdata$data, .method = "volume") %>% vis() repExplore(immdata$data, .method = "count") %>% vis() repExplore(immdata$data, .method = "len") %>% vis()data(immdata) # Calculate statistics and generate a visual output with vis() repExplore(immdata$data, .method = "volume") %>% vis() repExplore(immdata$data, .method = "count") %>% vis() repExplore(immdata$data, .method = "len") %>% vis()
repFilter( .data, .method = "by.clonotype", .query = list(CDR3.aa = exclude("partial", "out_of_frame")), .match = "exact" )repFilter( .data, .method = "by.clonotype", .query = list(CDR3.aa = exclude("partial", "out_of_frame")), .match = "exact" )
.data |
The data to be processed. Must be the list of 2 elements: a data table and a metadata table. |
.method |
Method of filtering. Implemented methods: by.meta, by.repertoire (by.rep), by.clonotype (by.cl) Default value: 'by.clonotype'. |
.query |
Filtering query. It's a named list of filters that will be applied to data. Possible values for names in this list are dependent on filter methods:
|
.match |
Matching method for "include" and "exclude" options in query. Possible values:
|
This function creates germlines for clonal lineages. B cell clonal lineage represents a set of B cells that presumably have a common origin (arising from the same VDJ rearrangement event) and a common ancestor. Each clonal lineage has its own germline sequence that represents the ancestral sequence for each BCR in clonal lineage. In other words, germline sequence is a sequence of B-cells immediately after VDJ recombination, before B-cell maturation and hypermutation process. Germline sequence is useful for assessing the degree of mutation and maturity of the repertoire.
repGermline(.data, .species, .min_nuc_outside_cdr3, .threads)repGermline(.data, .species, .min_nuc_outside_cdr3, .threads)
.data |
The data to be processed. Can be data.frame, data.table::data.table or a list of these objects. It must have columns in the immunarch compatible format immunarch_data_format. |
.species |
Species from which the data was acquired. Available options: "HomoSapiens" (default), "MusMusculus", "BosTaurus", "CamelusDromedarius", "CanisLupusFamiliaris", "DanioRerio", "MacacaMulatta", "MusMusculusDomesticus", "MusMusculusCastaneus", "MusMusculusMolossinus", "MusMusculusMusculus", "MusSpretus", "OncorhynchusMykiss", "OrnithorhynchusAnatinus", "OryctolagusCuniculus", "RattusNorvegicus", "SusScrofa". |
.min_nuc_outside_cdr3 |
This parameter sets how many nucleotides should have V or J chain outside of CDR3 to be considered good for further alignment. |
.threads |
Number of threads to use. |
Data with added columns:
Sequence (FR1+CDR1+FR2+CDR2+FR3+CDR3+FR4 in nucleotides; the column will be replaced if exists)
V.allele, J.allele (chosen alleles of V and J genes),
V.aa, J.aa (V and J sequences from original clonotype, outside CDR3, converted to amino acids)
Germline.sequence (combined germline nucleotide sequence)
data(bcrdata) bcrdata$data %>% top(5) %>% repGermline()data(bcrdata) bcrdata$data %>% top(5) %>% repGermline()
The repLoad function loads repertoire files
into R workspace in the immunarch format where you can immediately use them for
the analysis. repLoad automatically detects the right format for
your files, so all you need is simply provide the path to your files.
See "Details" for more information on supported formats. See "Examples" for diving right into it.
repLoad(.path, .mode = "paired", .coding = TRUE, ...)repLoad(.path, .mode = "paired", .coding = TRUE, ...)
.path |
A character string specifying the path to the input data. Input data can be one of the following:
|
.mode |
Either "single" for single chain data or "paired" for paired chain data. Currently "single" works for every format, and "paired" works only for 10X Genomics data. By default, 10X Genomics data will be loaded as paired chain data, and other files will be loaded as single chain data. |
.coding |
A logical value. Set TRUE to get coding-only clonotypes (by defaul). Set FALSE to get all clonotypes. |
... |
Extra arguments for parsing functions |
The metadata has to be a tab delimited file with first column named "Sample". It can have any number of additional columns with arbitrary names. The first column should contain base names of files without extensions in your folder. Example:
| Sample | Sex | Age | Status |
| immunoseq_1 | M | 1 | C |
| immunoseq_2 | M | 2 | C |
| immunoseq_3 | FALSE | 3 | A |
Currently, Immunarch support the following formats:
"immunoseq" - ImmunoSEQ of any version. http://www.adaptivebiotech.com/immunoseq
"mitcr" - MiTCR. https://github.com/milaboratory/mitcr
"mixcr" - MiXCR (the "all" files) of any version. https://github.com/milaboratory/mixcr
"migec" - MiGEC. http://migec.readthedocs.io/en/latest/
"migmap" - For parsing IgBLAST results postprocessed with MigMap. https://github.com/mikessh/migmap
"tcr" - tcR, our previous package. https://imminfo.github.io/tcr/
"vdjtools" - VDJtools of any version. http://vdjtools-doc.readthedocs.io/en/latest/
"imgt" - IMGT HighV-QUEST. http://www.imgt.org/HighV-QUEST/
"airr" - adaptive immune receptor repertoire (AIRR) data format. http://docs.airr-community.org/en/latest/datarep/overview.html
"10x" - 10XGenomics clonotype annotations tables. https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/output/annotation
"archer" - ArcherDX clonotype tables. https://archerdx.com/
A list with two named elements:
"data" is a list of input samples;
"meta" is a data frame with sample metadata.
immunr_data_format for immunarch data format; repSave for file saving; repOverlap, geneUsage and repDiversity for starting with immune repertoires basic statistics.
# To load the data from a single file (note that you don't need to specify the data format): file_path <- paste0(system.file(package = "immunarch"), "/extdata/io/Sample1.tsv.gz") immdata <- repLoad(file_path) # Suppose you have a following structure in your folder: # >_ ls # immunoseq1.txt # immunoseq2.txt # immunoseq3.txt # metadata.txt # To load the whole folder with every file in it type: file_path <- paste0(system.file(package = "immunarch"), "/extdata/io/") immdata <- repLoad(file_path) print(names(immdata)) # We recommend creating a metadata file named "metadata.txt" in the folder. # In that case, when you load your data you will see: # > immdata <- repLoad("path/to/your/folder/") # > names(immdata) # [1] "data" "meta" # If you do not have "metadata.txt", you will see the same output, # but your metadata will be almost empty: # > immdata <- repLoad("path/to/your/folder/") # > names(immdata) # [1] "data" "meta"# To load the data from a single file (note that you don't need to specify the data format): file_path <- paste0(system.file(package = "immunarch"), "/extdata/io/Sample1.tsv.gz") immdata <- repLoad(file_path) # Suppose you have a following structure in your folder: # >_ ls # immunoseq1.txt # immunoseq2.txt # immunoseq3.txt # metadata.txt # To load the whole folder with every file in it type: file_path <- paste0(system.file(package = "immunarch"), "/extdata/io/") immdata <- repLoad(file_path) print(names(immdata)) # We recommend creating a metadata file named "metadata.txt" in the folder. # In that case, when you load your data you will see: # > immdata <- repLoad("path/to/your/folder/") # > names(immdata) # [1] "data" "meta" # If you do not have "metadata.txt", you will see the same output, # but your metadata will be almost empty: # > immdata <- repLoad("path/to/your/folder/") # > names(immdata) # [1] "data" "meta"
The repOverlap function is designed to analyse the overlap between
two or more repertoires. It contains a number of methods to compare immune receptor
sequences that are shared between individuals.
repOverlap( .data, .method = c("public", "overlap", "jaccard", "tversky", "cosine", "morisita", "inc+public", "inc+morisita"), .col = "aa", .a = 0.5, .b = 0.5, .verbose = TRUE, .step = 1000, .n.steps = 10, .downsample = FALSE, .bootstrap = NA, .verbose.inc = NA, .force.matrix = FALSE )repOverlap( .data, .method = c("public", "overlap", "jaccard", "tversky", "cosine", "morisita", "inc+public", "inc+morisita"), .col = "aa", .a = 0.5, .b = 0.5, .verbose = TRUE, .step = 1000, .n.steps = 10, .downsample = FALSE, .bootstrap = NA, .verbose.inc = NA, .force.matrix = FALSE )
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.method |
A string that specifies the method of analysis or a combination of
methods. The |
.col |
A string that specifies the column(s) to be processed. Pass one of the following strings, separated by the plus sign: "nt" for nucleotide sequences, "aa" for amino acid sequences, "v" for V gene segments, "j" for J gene segments. E.g., pass "aa+v" to compute overlaps on CDR3 amino acid sequences paired with V gene segments, i.e., in this case a unique clonotype is a pair of CDR3 amino acid and V gene segment. Clonal counts of equal clonotypes will be summed up. |
.a, .b
|
Alpha and beta parameters for Tversky Index. Default values give the Jaccard index measure. |
.verbose |
if TRUE then output the progress. |
.step |
Either an integer or a numeric vector. In the first case, the integer defines the step of incremental overlap. In the second case, the vector encodes all repertoire sampling depths. |
.n.steps |
Skipped if ".step" is a numeric vector. |
.downsample |
If TRUE then performs downsampling to N clonotypes at each step instead of choosing the top N clonotypes in incremental overlaps. Change nothing of you are using conventional methods. |
.bootstrap |
Set NA to turn off any bootstrapping, set a number to perform bootstrapping with this number of tries. |
.verbose.inc |
Logical. If TRUE then shows output from the computation process. |
.force.matrix |
Logical. If TRUE then always forces the matrix output even in case of two input repertoires. |
"public" and "shared" are synonyms that exist for the convenience of researchers.
The "overlap" coefficient is a similarity measure that measures the overlap between two finite sets.
The "jaccard" index is conceptually a percentage of how many objects two sets have in common out of how many objects they have total.
The "tversky" index is an asymmetric similarity measure on sets that compares a variant to a prototype.
The "cosine" index is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them.
The "morisita" index measures how many times it is more likely to randomly select two sampled points from the same quadrat (the dataset is covered by a regular grid of changing size) then it would be in the case of a random distribution generated from a Poisson process. Duplicate objects are merged with their counts are summed up.
In most cases the return value is a matrix with overlap values for each pair of repertoires.
If only two repertoires were provided, return value is single numeric value.
If one of the incremental method is chosen, return list of overlap matrix.
data(immdata) # Make data smaller for testing purposes immdata$data <- top(immdata$data, 4000) ov <- repOverlap(immdata$data, .verbose = FALSE) vis(ov) ov <- repOverlap(immdata$data, "jaccard", .verbose = FALSE) vis(ov, "heatmap2")data(immdata) # Make data smaller for testing purposes immdata$data <- top(immdata$data, 4000) ov <- repOverlap(immdata$data, .verbose = FALSE) vis(ov) ov <- repOverlap(immdata$data, "jaccard", .verbose = FALSE) vis(ov, "heatmap2")
The repOverlapAnalysis() function contains advanced data
analysis methods. You can use several clustering and dimensionality reduction
techniques in order to investigate further the difference between repertoires
provided.
To cluster a subset of similar data with repOverlapAnalysis() you can
perform hierarchical clustering, k-means or dbscan ('hclust', 'kmeans', 'dbscan'
respectively).
To reduce dimensions, for example, to select features for subsequent analysis, you can execute the multidimensional scaling or t-sne algorithms ('mds' and 'tsne' respectively).
repOverlapAnalysis( .data, .method = ("hclust"), .scale = default_scale_fun, .raw = TRUE, .perp = 1, .theta = 0.1, .eps = 0.01, .k = 2 )repOverlapAnalysis( .data, .method = ("hclust"), .scale = default_scale_fun, .raw = TRUE, .perp = 1, .theta = 0.1, .eps = 0.01, .k = 2 )
.data |
Any distance matrix between pairs of repertoires. You can also pass your
output from |
.method |
A string that defines the type of analysis to perform. |
.scale |
A function to scale the data before passing it to the MDS algorithm. |
.raw |
A logical value. Set TRUE if you want to receive raw output of clustering
or dimensionality reduction function of choice. Set FALSE if you want to receive
processed output that can be subjected to visualisation with |
.perp |
A numerical value, t-SNE parameter, see |
.theta |
A numerical value, t-SNE parameter, see |
.eps |
A numerical value, DBscan epsylon parameter, see |
.k |
The number of clusters to create, passed as |
Depends on the last element in the .method string. See immunr_tsne for more info.
data(immdata) ov <- repOverlap(immdata$data) repOverlapAnalysis(ov, "mds+hclust") %>% vis()data(immdata) ov <- repOverlap(immdata$data) repOverlapAnalysis(ov, "mds+hclust") %>% vis()
Sample (downsample) repertoires using different approches.
repSample( .data, .method = c("downsample", "resample", "sample"), .n = NA, .prob = TRUE )repSample( .data, .method = c("downsample", "resample", "sample"), .n = NA, .prob = TRUE )
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.method |
Character. Name of a sampling method. See "Details" for more details. Default value is "downsample" that downsamples the repertoires to the number of clones (i.e., reads / UMIs) that the smallest repertoire has, if user doesn't set any value to the ".n" argument. |
.n |
Integer. Number of clones / clonotypes / reads / UMIs to choose, depending on the method. Set NA to sample repertoires to the size of the smallest repertoire in the ".data". |
.prob |
Logical. If TRUE then samples the clonotypes with probability weights equal to their number of clones. Used only if ".method" is "sample". |
If .method is "downsample" then repSample chooses .n clones (not clonotypes!) from the input repertoires without any probabilistic simulation,
but exactly computing each choosed clones. Such approach is is more consistent and biologically pleasant than
an output from the function if .method is "resample".
If .method is "resample" then repSample uses multinomial distribution to compute the number of occurences for each cloneset.
then it removes zero-number clonotypes and return the resulting data frame. Probabilities for rmultinom for each cloneset
is a percentage of this cloneset in the "Proportion" column. It's a some sort of simulation of how clonotypes are chosen from the organisms.
if .method is "sample" then repSample chooses .n clonotypes (not clones!) randomly. Depending on the
.prob argument, the function chooses clonotypes either according to their size (if .prob is TRUE, by default),
or each clonotype has an equal chance to be choosed (if .prob is FALSE). Note that sampling is done without replacing.
Subsampled immune repertoire or a list of subsampled immune repertoires.
data(immdata) # Downsampling to 1000 clones (not clonotypes!) tmp <- repSample(immdata$data[[1]], .n = 1000) sum(tmp$Clones) # Downsampling to 1000 clonotypes tmp <- repSample(immdata$data[[1]], "sample", .n = 1000) nrow(tmp) # Downsampling to the smallest repertoire by clones (not clonotypes!) tmp <- repSample(immdata$data[c(1, 2)]) sum(tmp[[1]]$Clones) sum(tmp[[2]]$Clones) # Downsampling to the smallest repertoire by clonotypes tmp <- repSample(immdata$data[c(1, 2)], "sample") nrow(tmp[[1]]$Clones) nrow(tmp[[2]]$Clones)data(immdata) # Downsampling to 1000 clones (not clonotypes!) tmp <- repSample(immdata$data[[1]], .n = 1000) sum(tmp$Clones) # Downsampling to 1000 clonotypes tmp <- repSample(immdata$data[[1]], "sample", .n = 1000) nrow(tmp) # Downsampling to the smallest repertoire by clones (not clonotypes!) tmp <- repSample(immdata$data[c(1, 2)]) sum(tmp[[1]]$Clones) sum(tmp[[2]]$Clones) # Downsampling to the smallest repertoire by clonotypes tmp <- repSample(immdata$data[c(1, 2)], "sample") nrow(tmp[[1]]$Clones) nrow(tmp[[2]]$Clones)
The repSave function is deigned to save your data to the disk
in desirable format. Currently supports "immunarch" and "vdjtools" file formats.
repSave(.data, .path, .format = c("immunarch", "vdjtools"), .compress = TRUE)repSave(.data, .path, .format = c("immunarch", "vdjtools"), .compress = TRUE)
.data |
An R dataframe, a list of R dataframes or a list with |
.path |
A string with the path to the output directory. It should include file name if a single dataframe is provided to .data argument. |
.format |
A string with desirable format specification. Current options are "immunarch" and "vdjtools". |
.compress |
A boolean value. Defines whether the output will be compressed or not. |
It is not necessary to create directories beforehand. If the provided directory does not exist it will be created automatically.
No return value.
## Not run: data(immdata) # Reduce data to save time on examples immdata$data <- map(immdata$data, ~ .x %>% head(10)) dirpath <- tempdir() # Save the list of repertoires repSave(immdata, dirpath) # Load it and check if it is the same new_immdata <- repLoad(dirpath) # sum(immdata$data[[1]] != new_immdata$data[[1]], na.rm = TRUE) # sum(immdata$data[[2]] != new_immdata$data[[2]], na.rm = TRUE) # sum(immdata$meta != new_immdata$meta, na.rm = TRUE) ## End(Not run)## Not run: data(immdata) # Reduce data to save time on examples immdata$data <- map(immdata$data, ~ .x %>% head(10)) dirpath <- tempdir() # Save the list of repertoires repSave(immdata, dirpath) # Load it and check if it is the same new_immdata <- repLoad(dirpath) # sum(immdata$data[[1]] != new_immdata$data[[1]], na.rm = TRUE) # sum(immdata$data[[2]] != new_immdata$data[[2]], na.rm = TRUE) # sum(immdata$meta != new_immdata$meta, na.rm = TRUE) ## End(Not run)
A family of functions to quantify public or shared receptors between repertoire.
Supported methods are the following.
repsim_intersection - number of shared receptors between
each pair of repertoires (intersection size). Handy for quick overlap heatmaps,
QC of replicate similarity, or spotting donor-shared "public" clonotypes.
repsim_jaccard - Jaccard similarity of receptor
sets between repertoires ( / ). Best when comparing cohorts with
different sizes to get a scale-invariant overlap score.
repsim_intersection( idata, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) repsim_jaccard( idata, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") )repsim_intersection( idata, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") ) repsim_jaccard( idata, autojoin = getOption("immundata.autojoin", TRUE), format = c("long", "wide") )
idata |
An |
autojoin |
Logical. If TRUE, join repertoire metadata by the schema repertoire id.
Change the default behaviour by calling |
format |
String. One of |
repsim_intersectionA symmetric numeric matrix where rows/columns are repertoire_id and each
cell is the count of shared unique receptors. The diagonal contains per-repertoire
richness (total unique receptors). Row/column names are repertoire IDs.
repsim_jaccardA symmetric numeric matrix where rows/columns are repertoire_id and each
cell is the Jaccard similarity in [0, 1]. The diagonal is 1. Row/column
names are repertoire IDs.
# Limit the number of threads used by the underlying DB for this session. # Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers). db_exec("SET threads TO 1") # Load data ## Not run: immdata <- get_test_idata() |> agg_repertoires("Therapy") ## End(Not run) # # repsim_intersection # ## Not run: m_pub <- repsim_intersection(immdata) ## End(Not run) # # repsim_jaccard # ## Not run: m_jac <- repsim_jaccard(immdata) ## End(Not run)# Limit the number of threads used by the underlying DB for this session. # Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers). db_exec("SET threads TO 1") # Load data ## Not run: immdata <- get_test_idata() |> agg_repertoires("Therapy") ## End(Not run) # # repsim_intersection # ## Not run: m_pub <- repsim_intersection(immdata) ## End(Not run) # # repsim_jaccard # ## Not run: m_jac <- repsim_jaccard(immdata) ## End(Not run)
This function aligns V and J genes from the germline in each cluster with corresponding genes in each clonotype, saves the alignments for purpose of visualization, and calculates number of mutations for each clonotype.
repSomaticHypermutation(.data, .threads, .nofail)repSomaticHypermutation(.data, .threads, .nofail)
.data |
The data to be processed: an output of repClonalFamily(); variants with one sample and list of samples are both supported. |
.threads |
Number of threads to use. |
.nofail |
Will return NA instead of stopping if Clustal W is not installed. Used to avoid raising errors in examples on computers where Clustal W is not installed. |
Dataframe or list of dataframes (if input is a list with multiple samples). The dataframe has all the columns from repClonalFamily() output dataframe, with Sequence column unnested: the resulting dataframe has one line per clonotype. Clone.ID column contains original IDs for clonotypes, and can be used as dataframe key. New columns are added:
Germline.Alignment.V: contains V gene alignment of current clonotype with the germline
Germline.Alignment.J: contains J gene alignment of current clonotype with the germline
Substitutions: contains number of substitutions in the alignment (summary for V and J)
Insertions: contains number of insertions in the clonotype relative to germline (summary for V and J)
Deletions: contains number of deletions in the clonotype relative to germline (summary for V and J)
Mutations: contains total number of mutations in the alignment (summary for V and J)
data(bcrdata) bcr_data <- bcrdata$data bcr_data %>% seqCluster(seqDist(bcr_data), .fixed_threshold = 3) %>% repGermline(.threads = 1) %>% repAlignLineage(.min_lineage_sequences = 2, .align_threads = 2, .nofail = TRUE) %>% repClonalFamily(.threads = 1, .nofail = TRUE) %>% repSomaticHypermutation(.threads = 1, .nofail = TRUE)data(bcrdata) bcr_data <- bcrdata$data bcr_data %>% seqCluster(seqDist(bcr_data), .fixed_threshold = 3) %>% repGermline(.threads = 1) %>% repAlignLineage(.min_lineage_sequences = 2, .align_threads = 2, .nofail = TRUE) %>% repClonalFamily(.threads = 1, .nofail = TRUE) %>% repSomaticHypermutation(.threads = 1, .nofail = TRUE)
Subsets the input immune repertoire by barcodes. Creates a vector of barcodes to subset or a vector cluster IDs and corresponding barcodes to get a list of immune repertoires corresponding to cluster IDs. Columns with clonotype counts and proportions are changed accordingly to the filtered barcodes.
select_barcodes(.data, .barcodes, .force.list = FALSE)select_barcodes(.data, .barcodes, .force.list = FALSE)
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.barcodes |
Either a character vector with barcodes or a named character/factor vector with
barcodes as names and cluster IDs a vector elements. The output of Seurat's |
.force.list |
Logical. If TRUE then always returns a list, even if the result is one data frame. |
An immune repertoire (if ".barcodes" is a barcode vector) or a list of immune repertoires (if ".barcodes" is named vector or an output from Seurat::Idents()). Each element is an immune repertoire with clonotype barcodes corresponding to the input barcodes. The output list names are cluster names in the ".barcode" argument (Seurat::Idents() case only).
## Not run: data(immdata) # Create a fake single-cell data df <- immdata$data[[1]] df$Barcode <- "AAAAACCCCC" df$Barcode[51:nrow(df)] <- "GGGGGCCCCC" barcodes <- "AAAAACCCCC" df <- select_barcodes(df, barcodes) nrow(df) ## End(Not run)## Not run: data(immdata) # Create a fake single-cell data df <- immdata$data[[1]] df$Barcode <- "AAAAACCCCC" df$Barcode[51:nrow(df)] <- "GGGGGCCCCC" barcodes <- "AAAAACCCCC" df <- select_barcodes(df, barcodes) nrow(df) ## End(Not run)
Given the vector of barcodes from Seurat, splits the input repertoires to separate subsets following the barcodes' assigned IDs. Useful in case you want to split immune repertoires by patients or clusters.
select_clusters(.data, .clusters, .field = "Cluster")select_clusters(.data, .clusters, .field = "Cluster")
.data |
List of two elements "data" and "meta", with "data" being a list of immune repertoires, and "meta" being a metadata table. |
.clusters |
Factor vector with barcodes as vector names and cluster IDs as vector elements.
The output of the Seurat |
.field |
A string specifying the name of the field in the input metadata. New immune repertoire subsets will have cluster IDs in this field. |
A list with two elements "data" and "meta" with updated immune repertoire tables and metadata.
## Not run: library(Seurat) Idents(pbmc_small) new_cluster_ids <- c("A", "B", "C") new_cluster_ids <- levels(pbmc_small) new_cluster_ids pbmc_small <- RenameIdents(pbmc_small, new_cluster_ids) ## End(Not run)## Not run: library(Seurat) Idents(pbmc_small) new_cluster_ids <- c("A", "B", "C") new_cluster_ids <- levels(pbmc_small) new_cluster_ids pbmc_small <- RenameIdents(pbmc_small, new_cluster_ids) ## End(Not run)
Graph clustering based on distances between sequences
seqCluster(.data, .dist, .perc_similarity, .nt_similarity, .fixed_threshold)seqCluster(.data, .dist, .perc_similarity, .nt_similarity, .fixed_threshold)
.data |
The data which was used to caluculate .dist object. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format immunarch_data_format |
.dist |
List of distance objects produced with seqDist function. |
.perc_similarity |
Numeric value between 0 and 1 specifying the maximum acceptable weight of an edge in a graph. This threshold depends on the length of sequences. |
.nt_similarity |
Numeric between 0-sequence length specifying the threshold of allowing a 1 in n nucleotides mismatch in sequencies. |
.fixed_threshold |
Numeric specifying the threshold on the maximum weight of an edge in a graph. |
Immdata data format object. Same as .data, but with extra 'Cluster' column with clusters assigned.
## Not run: data(immdata) # In this example, we will use only 2 samples with 500 clonotypes in each for time saving input_data <- lapply(immdata$data[1:2], head, 500) dist_result <- seqDist(input_data) cluster_result <- seqCluster(input_data, dist_result, .fixed_threshold = 1) ## End(Not run)## Not run: data(immdata) # In this example, we will use only 2 samples with 500 clonotypes in each for time saving input_data <- lapply(immdata$data[1:2], head, 500) dist_result <- seqDist(input_data) cluster_result <- seqCluster(input_data, dist_result, .fixed_threshold = 1) ## End(Not run)
Computing sequential distances between clonotypes from two repertoires:
seqDist(.data, .col = 'CDR3.nt', .method = 'hamming', .group_by = c("V.name", "J.name"), .group_by_seqLength = TRUE, .trim_genes = TRUE, ...)seqDist(.data, .col = 'CDR3.nt', .method = 'hamming', .group_by = c("V.name", "J.name"), .group_by_seqLength = TRUE, .trim_genes = TRUE, ...)
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format immunarch_data_format |
.col |
A string that specifies the column name to be processed. The default value is 'CDR3.nt'. |
.method |
Character value or user-defined function. |
.group_by |
Character vector of column names to group sequence by. The default value is c("V.first", "J.first"). Columns "V.first" and "J.first" containing first genes without allele suffixes are calculated automatically from "V.name" and "J.name" if absent in the data. Pass NA for no grouping options. |
.group_by_seqLength |
If TRUE - adds grouping by sequence length of .col argument |
.trim_genes |
If TRUE - use only general gene values (e.g. "IGHV1-18") of .group_by columns for clustering; if FALSE - can cause very small clusters in case of high resolution genotyping |
... |
Extra arguments for user-defined function. The default value is Other possible values are:
In case of user-defined function, it should take x and y parameters as input and return dist object. |
Named list of list with dist objects for given repertoires for each combination of .group_by variable(s) and/or sequence length of .col.
## Not run: data(immdata) # Reducing data to save time on examples immdata$data <- map(immdata$data, ~ .x %>% head(10)) # Computing hamming distance for the first two repertoires in `'immdata'` seqDist(immdata$data[1:2]) # Here we define a custom distance function # that will count the difference in number of characters in sequences. f <- function(x, y) { res <- matrix(nrow = length(x), ncol = length(y)) for (i in 1:length(x)) { res[i, ] <- abs(nchar(x[i]) - nchar(y)) } dimnames(res) <- list(x, y) return(as.dist(res)) } seqDist(immdata$data[1:2], .method = f, .group_by_seqLength = FALSE) ## End(Not run)## Not run: data(immdata) # Reducing data to save time on examples immdata$data <- map(immdata$data, ~ .x %>% head(10)) # Computing hamming distance for the first two repertoires in `'immdata'` seqDist(immdata$data[1:2]) # Here we define a custom distance function # that will count the difference in number of characters in sequences. f <- function(x, y) { res <- matrix(nrow = length(x), ncol = length(y)) for (i in 1:length(x)) { res[i, ] <- abs(nchar(x[i]) - nchar(y)) } dimnames(res) <- list(x, y) return(as.dist(res)) } seqDist(immdata$data[1:2], .method = f, .group_by_seqLength = FALSE) ## End(Not run)
Set and update progress bars
set_pb(.max) add_pb(.pb, .value = 1)set_pb(.max) add_pb(.pb, .value = 1)
.max |
Integer. Maximal value of the progress bar. |
.pb |
Progress bar object from |
.value |
Numeric. Value to add to the progress bar at each step. |
An updated progress bar.
pb <- immunarch:::set_pb(100) immunarch:::add_pb(pb, 25) immunarch:::add_pb(pb, 25) immunarch:::add_pb(pb, 25) immunarch:::add_pb(pb, 25) close(pb)
spectratype(.data, .quant = c("id", "count"), .col = "nt")spectratype(.data, .quant = c("id", "count"), .col = "nt")
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.quant |
Select the column with clonal counts to evaluate. Set to "id" to count every clonotype once. Set to "count" to take into the account number of clones per clonotype. |
.col |
A string that specifies the column(s) to be processed. The output is one of the following strings, separated by the plus sign: "nt" for nucleotide sequences, "aa" for amino acid sequences, "v" for V gene segments, "j" for J gene segments. E.g., pass "aa+v" for spectratyping on CDR3 amino acid sequences paired with V gene segments, i.e., in this case a unique clonotype is a pair of CDR3 amino acid and V gene segment. Clonal counts of equal clonotypes will be summed up. |
Data frame with distributions of clonotypes per CDR3 length.
# Load the data data(immdata) sp <- spectratype(immdata$data[[1]], .col = "aa+v") vis(sp)# Load the data data(immdata) sp <- spectratype(immdata$data[[1]], .col = "aa+v") vis(sp)
split_to_kmers(.data, .k) kmer_profile(.data, .method = c("freq", "prob", "wei", "self"), .remove.stop = TRUE)split_to_kmers(.data, .k) kmer_profile(.data, .method = c("freq", "prob", "wei", "self"), .remove.stop = TRUE)
.data |
Character vector or the output from |
.k |
Integer. Size of k-mers. |
.method |
Character vector of length one. If "freq" then returns a position frequency matrix (PFM) - a matrix with occurences of each amino acid in each position. If "prob" then returns a position probability matrix (PPM) - a matrix with probabilities of occurences of each amino acid in each position. This is a traditional representation of sequence motifs. If "wei" then returns a position weight matrix (PWM) - a matrix with log likelihoods of PPM elements. If "self" then returns a matrix with self-information of elements in PWM. For more information see https://en.wikipedia.org/wiki/Position_weight_matrix. |
.remove.stop |
Logical. If TRUE (by default) remove stop codons. |
split_to_kmers - Data frame with two columns (k-mers and their counts).
kmer_profile - a matrix with per-position amino acid statistics.
data(immdata) kmers <- getKmers(immdata$data[[1]], 5) kmer_profile(kmers) %>% vis()data(immdata) kmers <- getKmers(immdata$data[[1]], 5) kmer_profile(kmers) %>% vis()
Return a column's name
switch_type(type) process_col_argument(.col)switch_type(type) process_col_argument(.col)
type |
Character. Specifies the column to choose: "nt" chooses the CDR3 nucleotide column, "aa" chooses the CDR3 amino acid column, "v" chooses the V gene segment column, "j" chooses the J gene segment column. |
.col |
A string that specifies the column(s) to be processed. Select one of the following strings, separated by the plus sign: "nt" for nucleotide sequences, "aa" for amino acid sequences, "v" for V gene segments, "j" for J gene segments. |
A column's name.
immunarch:::switch_type("nuc") immunarch:::switch_type("v")
Get the N most abundant clonotypes
top(.data, .n = 10)top(.data, .n = 10)
.data |
The data to be processed. Can be data.frame, data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.n |
Numeric. Number of the most abundant clonotypes to return. |
Data frame with the .n most abundant clonotypes only.
data(immdata) top(immdata$data) top(immdata$data[[1]])data(immdata) top(immdata$data) top(immdata$data[[1]])
Tracks the temporal dynamics of clonotypes in repertoires. For example, tracking across multiple time points after vaccination.
Note: duplicated clonotypes are merged and their counts are summed up.
trackClonotypes(.data, .which = list(1, 15), .col = "aa", .norm = TRUE)trackClonotypes(.data, .which = list(1, 15), .col = "aa", .norm = TRUE)
.data |
The data to process. It can be a data.frame, a data.table::data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. |
.which |
An argument that regulates which clonotypes to choose for tracking. There are three options for this argument:
See the "Examples" below with examples for each option. |
.col |
A character vector of length 1. Specifies an identifier for a column, from which the function chooses clonotype sequences. Specify "nt" for nucleotide sequences, "aa" for amino acid sequences, "aa+v" for amino acid sequences and Variable genes, "nt+j" for nucleotide sequences with Joining genes, or any combination of the above. Used only if ".which" has option 1) or option 2). |
.norm |
Logical. If TRUE then uses Proportion instead of the number of Clones per clonotype to store in the function output. |
Data frame with input sequences and counts or proportions for each of the input repertoire.
# Load an example data that comes with immunarch data(immdata) # Make the data smaller in order to speed up the examples immdata$data <- immdata$data[c(1, 2, 3, 7, 8, 9)] immdata$meta <- immdata$meta[c(1, 2, 3, 7, 8, 9), ] # Option 1 # Choose the first 10 amino acid clonotype sequences # from the first repertoire to track tc <- trackClonotypes(immdata$data, list(1, 10), .col = "aa") # Choose the first 20 nucleotide clonotype sequences # and their V genes from the "MS1" repertoire to track tc <- trackClonotypes(immdata$data, list("MS1", 20), .col = "nt+v") # Option 2 # Choose clonotypes with amino acid sequences "CASRGLITDTQYF" or "CSASRGSPNEQYF" tc <- trackClonotypes(immdata$data, c("CASRGLITDTQYF", "CSASRGSPNEQYF"), .col = "aa") # Option 3 # Choose the first 10 clonotypes from the first repertoire # with amino acid sequences and V segments target <- immdata$data[[1]] %>% select(CDR3.aa, V.name) %>% head(10) tc <- trackClonotypes(immdata$data, target) # Visualise the output regardless of the chosen option # Therea are three way to visualise it, regulated by the .plot argument vis(tc, .plot = "smooth") vis(tc, .plot = "area") vis(tc, .plot = "line") # Visualising timepoints # First, we create an additional column in the metadata with randomly choosen timepoints: immdata$meta$Timepoint <- sample(1:length(immdata$data)) immdata$meta # Next, we create a vector with samples in the right order, # according to the "Timepoint" column (from smallest to greatest): sample_order <- order(immdata$meta$Timepoint) # Sanity check: timepoints are following the right order: immdata$meta$Timepoint[sample_order] # Samples, sorted by the timepoints: immdata$meta$Sample[sample_order] # And finally, we visualise the data: vis(tc, .order = sample_order)# Load an example data that comes with immunarch data(immdata) # Make the data smaller in order to speed up the examples immdata$data <- immdata$data[c(1, 2, 3, 7, 8, 9)] immdata$meta <- immdata$meta[c(1, 2, 3, 7, 8, 9), ] # Option 1 # Choose the first 10 amino acid clonotype sequences # from the first repertoire to track tc <- trackClonotypes(immdata$data, list(1, 10), .col = "aa") # Choose the first 20 nucleotide clonotype sequences # and their V genes from the "MS1" repertoire to track tc <- trackClonotypes(immdata$data, list("MS1", 20), .col = "nt+v") # Option 2 # Choose clonotypes with amino acid sequences "CASRGLITDTQYF" or "CSASRGSPNEQYF" tc <- trackClonotypes(immdata$data, c("CASRGLITDTQYF", "CSASRGSPNEQYF"), .col = "aa") # Option 3 # Choose the first 10 clonotypes from the first repertoire # with amino acid sequences and V segments target <- immdata$data[[1]] %>% select(CDR3.aa, V.name) %>% head(10) tc <- trackClonotypes(immdata$data, target) # Visualise the output regardless of the chosen option # Therea are three way to visualise it, regulated by the .plot argument vis(tc, .plot = "smooth") vis(tc, .plot = "area") vis(tc, .plot = "line") # Visualising timepoints # First, we create an additional column in the metadata with randomly choosen timepoints: immdata$meta$Timepoint <- sample(1:length(immdata$data)) immdata$meta # Next, we create a vector with samples in the right order, # according to the "Timepoint" column (from smallest to greatest): sample_order <- order(immdata$meta$Timepoint) # Sanity check: timepoints are following the right order: immdata$meta$Timepoint[sample_order] # Samples, sorted by the timepoints: immdata$meta$Sample[sample_order] # And finally, we visualise the data: vis(tc, .order = sample_order)
vis() is a lightweight, quick-look plotting helper.
It's designed to help you visualise results fast with sensible defaults.
It automatically detects the input type and chooses an appropriate visualisation
(e.g., output from airr_stats_genes() is recognised as gene usage values and plotted
without extra arguments).
vis() is not intended for publication-quality figures. For serious, highly customised, or publication-ready plots, I recommend building your graphics directly with ggplot2.
vis(.data, ...)vis(.data, ...)
.data |
The output from any immunarch analysis function. The function automatically resolves to a correct visualisation. |
... |
Any other arguments, see the "Details" section for specific visualisation functions. |
List of available visualisations for different kinds of data - will be available soon.
A ggplot2 object.
fixVis for precise manipulation of plots.
## Not run: immdata <- get_test_idata() |> agg_repertoires("Therapy") airr_stats_genes(immdata, gene_col = "v_call") |> vis() airr_public_jaccard(immdata) |> vis() airr_diversity_pielou(immdata) |> vis() airr_diversity_chao1(immdata) |> vis() ## End(Not run)## Not run: immdata <- get_test_idata() |> agg_repertoires("Therapy") airr_stats_genes(immdata, gene_col = "v_call") |> vis() airr_public_jaccard(immdata) |> vis() airr_diversity_pielou(immdata) |> vis() airr_diversity_chao1(immdata) |> vis() ## End(Not run)
vis_bar( .data, .by = NA, .meta = NA, .errorbars = c(0.025, 0.975), .errorbars.off = FALSE, .stack = FALSE, .points = TRUE, .test = TRUE, .signif.label.size = 3.5, .errorbar.width = 0.2, .defgroupby = "Sample", .grouping.var = "Group", .labs = c("X", "Y"), .title = "Barplot (.title argument)", .subtitle = "Subtitle (.subtitle argument)", .legend = NA, .leg.title = "Legend (.leg.title argument)", .legend.pos = "right", .rotate_x = 90 )vis_bar( .data, .by = NA, .meta = NA, .errorbars = c(0.025, 0.975), .errorbars.off = FALSE, .stack = FALSE, .points = TRUE, .test = TRUE, .signif.label.size = 3.5, .errorbar.width = 0.2, .defgroupby = "Sample", .grouping.var = "Group", .labs = c("X", "Y"), .title = "Barplot (.title argument)", .subtitle = "Subtitle (.subtitle argument)", .legend = NA, .leg.title = "Legend (.leg.title argument)", .legend.pos = "right", .rotate_x = 90 )
.data |
Data to visualise. |
.by |
Pass NA if you want to plot samples without grouping. You can pass a character vector with one or several column names from ".meta" to group your data before plotting. In this case you should provide ".meta". You can pass a character vector that exactly matches the number of samples in your data, each value should correspond to a sample's property. It will be used to group data based on the values provided. Note that in this case you should pass NA to ".meta". |
.meta |
A metadata object. An R dataframe with sample names and their properties, such as age, serostatus or hla. |
.errorbars |
A numeric vector of length two with quantiles for error bars on sectors. Disabled if ".errorbars.off" is TRUE. |
.errorbars.off |
If TRUE then plot CI bars for distances between each group. Disabled if no group passed to the ".by" argument. |
.stack |
If TRUE and .errorbars.off is TRUE then plot stacked bar plots for each Group or Sample |
.points |
A logical value defining whether points will be visualised or not. |
.test |
A logical vector whether statistical tests should be applied. See "Details" for more information. |
.signif.label.size |
An integer value defining the size of text for p-value. |
.errorbar.width |
Numeric. Width for error bars. |
.defgroupby |
A name for the column with sample names. |
.grouping.var |
A name for the column to group by. |
.labs |
A character vector of length two specifying names for x-axis and y-axis. |
.title |
The text for the plot's title. |
.subtitle |
The text for the plot's subtitle. |
.legend |
If TRUE then displays a legend, otherwise removes legend from the plot. |
.leg.title |
The text for the plots's legend. Provide NULL to remove the legend's title completely. |
.legend.pos |
Positions of the legend: either "top", "bottom", "left" or "right". |
.rotate_x |
How much the x tick text should be rotated? In angles. |
A ggplot2 object.
## Not run: vis_bar(data.frame(Sample = c("A", "B", "C"), Value = c(1, 2, 3))) ## End(Not run)## Not run: vis_bar(data.frame(Sample = c("A", "B", "C"), Value = c(1, 2, 3))) ## End(Not run)
Visualisation of distributions using ggplot2-based boxplots.
vis_box( .data, .by = NA, .meta = NA, .melt = TRUE, .points = TRUE, .test = TRUE, .signif.label.size = 3.5, .defgroupby = "Sample", .grouping.var = "Group", .labs = c("X", "Y"), .title = "Boxplot (.title argument)", .subtitle = "Subtitle (.subtitle argument)", .legend = NA, .leg.title = "Legend (.leg.title argument)", .legend.pos = "right" )vis_box( .data, .by = NA, .meta = NA, .melt = TRUE, .points = TRUE, .test = TRUE, .signif.label.size = 3.5, .defgroupby = "Sample", .grouping.var = "Group", .labs = c("X", "Y"), .title = "Boxplot (.title argument)", .subtitle = "Subtitle (.subtitle argument)", .legend = NA, .leg.title = "Legend (.leg.title argument)", .legend.pos = "right" )
.data |
Input matrix or data frame. |
.by |
Pass NA if you want to plot samples without grouping. You can pass a character vector with one or several column names from ".meta" to group your data before plotting. In this case you should provide ".meta". You can pass a character vector that exactly matches the number of samples in your data, each value should correspond to a sample's property. It will be used to group data based on the values provided. Note that in this case you should pass NA to ".meta". |
.meta |
A metadata object. An R dataframe with sample names and their properties, such as age, serostatus or hla. |
.melt |
If TRUE then apply reshape2::melt to the ".data" before plotting. In this case ".data" is supposed to be a data frame with the first character column reserved for names of genes and other numeric columns reserved to counts or frequencies of genes. Each numeric column should be associated with a specific repertoire sample. |
.points |
A logical value defining whether points will be visualised or not. |
.test |
A logical vector whether statistical tests should be applied. See "Details" for more information. |
.signif.label.size |
An integer value defining the size of text for p-value. |
.defgroupby |
A name for the column with sample names. |
.grouping.var |
A name for the column to group by. |
.labs |
Character vector of length two with names for x-axis and y-axis, respectively. |
.title |
The text for the title of the plot. |
.subtitle |
The The text for the plot's subtitle. |
.legend |
If TRUE then displays a legend, otherwise removes legend from the plot. |
.leg.title |
The The text for the plots's legend. Provide NULL to remove the legend's title completely. |
.legend.pos |
Positions of the legend: either "top", "bottom", "left" or "right". |
A ggplot2 object.
vis.immunr_gene_usage, geneUsage
## Not run: vis_box(data.frame(Sample = sample(c("A", "B", "C"), 100, TRUE), Value = rnorm(100)), .melt = FALSE) ## End(Not run)## Not run: vis_box(data.frame(Sample = sample(c("A", "B", "C"), 100, TRUE), Value = rnorm(100)), .melt = FALSE) ## End(Not run)
Visualise matrices with the circlize::chordDiagram function from the circlize package.
vis_circos(.data, .title = NULL, ...)vis_circos(.data, .title = NULL, ...)
.data |
Input matrix. |
.title |
The The text for the title of the plot. |
... |
Other arguments passed to circlize::chordDiagram from the 'circlize' package. |
A circlize object.
## Not run: data(immdata) ov <- repOverlap(immdata$data) vis(ov, .plot = "circos") ## End(Not run)## Not run: data(immdata) ov <- repOverlap(immdata$data) vis(ov, .plot = "circos") ## End(Not run)
Fast and easy visualisations of matrices or data frames with functions based on the ggplot2 package.
vis_heatmap( .data, .text = TRUE, .scientific = FALSE, .signif.digits = 2, .text.size = 4, .axis.text.size = NULL, .labs = c("Sample", "Sample"), .title = "Overlap", .leg.title = "Overlap values", .legend = TRUE, .na.value = NA, .transpose = FALSE, ... )vis_heatmap( .data, .text = TRUE, .scientific = FALSE, .signif.digits = 2, .text.size = 4, .axis.text.size = NULL, .labs = c("Sample", "Sample"), .title = "Overlap", .leg.title = "Overlap values", .legend = TRUE, .na.value = NA, .transpose = FALSE, ... )
.data |
Input object: a matrix or a data frame. If matrix: column names and row names (if presented) will be used as names for labs. If data frame: the first column will be used for row names and removed from the data. Other columns will be used for values in the heatmap. |
.text |
If TRUE then plots values in the heatmap cells. If FALSE does not plot values, just plot coloured cells instead. |
.scientific |
If TRUE then uses the scientific notation for numbers (e.g., "2.0e+2"). |
.signif.digits |
Number of significant digits to display on plot. |
.text.size |
Size of text in the cells of heatmap. |
.axis.text.size |
Size of text on the axis labels. |
.labs |
A character vector of length two with names for x-axis and y-axis, respectively. |
.title |
The The text for the plot's title. |
.leg.title |
The The text for the plots's legend. Provide NULL to remove the legend's title completely. |
.legend |
If TRUE then displays a legend, otherwise removes the legend from the plot. |
.na.value |
Replace NA values with this value. By default they remain NA. |
.transpose |
Logical. If TRUE then switch rows and columns. |
... |
Other passed arguments. |
A ggplot2 object.
## Not run: data(immdata) ov <- repOverlap(immdata$data) vis_heatmap(ov) gu <- geneUsage(immdata$data, "hs.trbj") vis_heatmap(gu) ## End(Not run)## Not run: data(immdata) ov <- repOverlap(immdata$data) vis_heatmap(ov) gu <- geneUsage(immdata$data, "hs.trbj") vis_heatmap(gu) ## End(Not run)
Visualise matrices with the functions based on the pheatmap package with minimum amount of arguments.
vis_heatmap2( .data, .meta = NA, .by = NA, .title = NA, .color = colorRampPalette(c("#67001f", "#d6604d", "#f7f7f7", "#4393c3", "#053061"))(1024), ... )vis_heatmap2( .data, .meta = NA, .by = NA, .title = NA, .color = colorRampPalette(c("#67001f", "#d6604d", "#f7f7f7", "#4393c3", "#053061"))(1024), ... )
.data |
Input matrix. Column names and row names (if presented) will be used as names for labs. |
.meta |
A metadata object. An R dataframe with sample names and their properties, such as age, serostatus or hla. |
.by |
Set NA if you want to plot samples without grouping. |
.title |
The text for the plot's title (same as the "main" argument in pheatmap). |
.color |
A vector specifying the colors (same as the "color" argument in pheatmap). Pass NA to use the default pheatmap colors. |
... |
Other arguments for the pheatmap function. |
A pheatmap object.
## Not run: data(immdata) ov <- repOverlap(immdata$data) vis_heatmap2(ov) ## End(Not run)## Not run: data(immdata) ov <- repOverlap(immdata$data) vis_heatmap2(ov) ## End(Not run)
Visualisation of distributions using ggplot2-based histograms.
vis_hist( .data, .by = NA, .meta = NA, .title = "Gene usage", .ncol = NA, .points = TRUE, .test = TRUE, .coord.flip = FALSE, .grid = FALSE, .labs = c("Gene", NA), .melt = TRUE, .legend = NA, .add.layer = NULL, ... )vis_hist( .data, .by = NA, .meta = NA, .title = "Gene usage", .ncol = NA, .points = TRUE, .test = TRUE, .coord.flip = FALSE, .grid = FALSE, .labs = c("Gene", NA), .melt = TRUE, .legend = NA, .add.layer = NULL, ... )
.data |
Input matrix or data frame. |
.by |
Pass NA if you want to plot samples without grouping. You can pass a character vector with one or several column names from ".meta" to group your data before plotting. In this case you should provide ".meta". You can pass a character vector that exactly matches the number of samples in your data, each value should correspond to a sample's property. It will be used to group data based on the values provided. Note that in this case you should pass NA to ".meta". |
.meta |
A metadata object. An R dataframe with sample names and their properties, such as age, serostatus or hla. |
.title |
The text for the title of the plot. |
.ncol |
A number of columns to display. Provide NA (by default) if you want the function to automatically detect the optimal number of columns. |
.points |
A logical value defining whether points will be visualised or not. |
.test |
A logical vector whether statistical tests should be applied. See "Details" for more information. |
.coord.flip |
If TRUE then swap x- and y-axes. |
.grid |
If TRUE then plot separate visualisations for each sample. |
.labs |
A character vector of length two with names for x-axis and y-axis, respectively. |
.melt |
If TRUE then apply reshape2::melt to the ".data" before plotting. In this case ".data" is supposed to be a data frame with the first character column reserved for names of genes and other numeric columns reserved to counts or frequencies of genes. Each numeric column should be associated with a specific repertoire sample. |
.legend |
If TRUE then plots the legend. If FALSE removes the legend from the plot. If NA automatically detects the best way to display legend. |
.add.layer |
Addditional ggplot2 layers, that added to each plot in the output plot or grid of plots. |
... |
Is not used here. |
If data is grouped, then statistical tests for comparing means of groups will be performed, unless .test = FALSE is supplied.
In case there are only two groups, the Wilcoxon rank sum test (https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test) is performed
(R function wilcox.test() with an argument exact = FALSE) for testing if there is a difference in mean rank values between two groups.
In case there more than two groups, the Kruskal-Wallis test (https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance) is performed (R function kruskal.test()), that is equivalent to ANOVA for ranks and it tests whether samples from different groups originated from the same distribution.
A significant Kruskal-Wallis test indicates that at least one sample stochastically dominates one other sample.
Adjusted for multiple comparisons P-values are plotted on the top of groups.
P-value adjusting is done using the Holm method (https://en.wikipedia.org/wiki/Holm%E2%80%93Bonferroni_method) (also known as Holm-Bonferroni correction).
You can execute the command ?p.adjust in the R console to see more.
A ggplot2 object.
vis.immunr_gene_usage, geneUsage
## Not run: data(immdata) imm_gu <- geneUsage(immdata$data[[1]]) vis(imm_gu, .plot = "hist", .add.layer = theme(axis.text.x = element_text(angle = 75, vjust = 1)) ) imm_gu <- geneUsage(immdata$data[1:4]) vis(imm_gu, .plot = "hist", .grid = TRUE, .add.layer = theme(axis.text.x = element_text(angle = 75, vjust = 1)) ) ## End(Not run)## Not run: data(immdata) imm_gu <- geneUsage(immdata$data[[1]]) vis(imm_gu, .plot = "hist", .add.layer = theme(axis.text.x = element_text(angle = 75, vjust = 1)) ) imm_gu <- geneUsage(immdata$data[1:4]) vis(imm_gu, .plot = "hist", .grid = TRUE, .add.layer = theme(axis.text.x = element_text(angle = 75, vjust = 1)) ) ## End(Not run)
vis_immunr_kmer_profile_main(.data, .plot, ...)vis_immunr_kmer_profile_main(.data, .plot, ...)
.data |
Kmer data, an output from kmer_profile. |
.plot |
String specifying the plot type:
|
... |
Other arguments passed to vis_textlogo or vis_seqlogo, depending on the ".plot" argument. |
A ggplot2 object.
## Not run: data(immdata) getKmers(immdata$data[[1]], 5) %>% kmer_profile() %>% vis("seqlogo") ## End(Not run)## Not run: data(immdata) getKmers(immdata$data[[1]], 5) %>% kmer_profile() %>% vis("seqlogo") ## End(Not run)
Visualise correlation of public clonotype frequencies in pairs of repertoires.
vis_public_clonotypes( .data, .x.rep = NA, .y.rep = NA, .title = NA, .ncol = 3, .point.size.modif = 1, .cut.axes = TRUE, .density = TRUE, .lm = TRUE, .radj.size = 3.5 )vis_public_clonotypes( .data, .x.rep = NA, .y.rep = NA, .title = NA, .ncol = 3, .point.size.modif = 1, .cut.axes = TRUE, .density = TRUE, .lm = TRUE, .radj.size = 3.5 )
.data |
Public repertoire data - an output from the pubRep function. |
.x.rep |
Either indices of samples or character vector of sample names for the x-axis. Must be of the same length as ".y.rep". |
.y.rep |
Either indices of samples or character vector of sample names for the y-axis. Must be of the same length as ".x.rep". |
.title |
The text for the title of the plot. |
.ncol |
An integer number of columns to print in the grid of pairs of repertoires. |
.point.size.modif |
An integer value that is a modifier of the point size. The larger the number, the larger the points. |
.cut.axes |
If TRUE then axes limits become shorter. |
.density |
If TRUE then displays density plot for distributions of clonotypes for each sample. If FALSE then removes density plot from the visualisation. |
.lm |
If TRUE then fit a linear model and displays an R adjusted coefficient that shows how similar samples are in terms of shared clonotypes. |
.radj.size |
An integer value, that defines the size of the The text for the R adjusted coefficient. |
A ggplot2 object.
pubRep, vis.immunr_public_repertoire
## Not run: data(immdata) pr <- pubRep(immdata$data, .verbose = FALSE) vis(pr, "clonotypes", 1, 2) ## End(Not run)## Not run: data(immdata) pr <- pubRep(immdata$data, .verbose = FALSE) vis(pr, "clonotypes", 1, 2) ## End(Not run)
Visualise public clonotype frequencies.
vis_public_frequencies( .data, .by = NA, .meta = NA, .type = c("boxplot", "none", "mean") )vis_public_frequencies( .data, .by = NA, .meta = NA, .type = c("boxplot", "none", "mean") )
.data |
Public repertoire - an output from the pubRep function. |
.by |
Pass NA if you want to plot samples without grouping. You can pass a character vector with one or several column names from ".meta" to group your data before plotting. In this case you should provide ".meta". You can pass a character vector that exactly matches the number of samples in your data, each value should correspond to a sample's property. It will be used to group data based on the values provided. Note that in this case you should pass NA to ".meta". |
.meta |
A metadata object. An R dataframe with sample names and their properties, such as age, serostatus or hla. |
.type |
Character. Either "boxplot" for plotting distributions of frequencies, "none" for plotting everything, or "mean" for plotting average values only. |
A ggplot2 object.
## Not run: data(immdata) immdata$data <- lapply(immdata$data, head, 500) pr <- pubRep(immdata$data, .verbose = FALSE) vis(pr, "freq", .type = "boxplot") vis(pr, "freq", .type = "none") vis(pr, "freq", .type = "mean") vis(pr, "freq", .by = "Status", .meta = immdata$meta) ## End(Not run)## Not run: data(immdata) immdata$data <- lapply(immdata$data, head, 500) pr <- pubRep(immdata$data, .verbose = FALSE) vis(pr, "freq", .type = "boxplot") vis(pr, "freq", .type = "none") vis(pr, "freq", .type = "mean") vis(pr, "freq", .by = "Status", .meta = immdata$meta) ## End(Not run)
Plot sequence logo plots for visualising of amino acid motif sequences / profiles.
vis_textlogo plots sequences in a text format - each letter has the same height. Useful when there
are no big differences between occurences of amino acids in the motif.
vis_seqlogo is a traditional sequence logo plots. Useful when there are one or two amino acids
with clear differences in their occurrences.
vis_textlogo(.data, .replace.zero.with.na = TRUE, .width = 0.1, ...) vis_seqlogo(.data, .scheme = "chemistry", ...)vis_textlogo(.data, .replace.zero.with.na = TRUE, .width = 0.1, ...) vis_seqlogo(.data, .scheme = "chemistry", ...)
.data |
Output from the |
.replace.zero.with.na |
if TRUE then replace all zeros with NAs, therefore letters with zero frequency wont appear at the plot. |
.width |
Width for jitter, i.e., how much points will scatter around the verical line. Pass 0 (zero) to plot points on the straight vertical line for each position. |
... |
Not used here. |
.scheme |
Character. An argument passed to geom_logo from ggseqlogo package specifying how to colour symbols. |
A ggplot2 object.
## Not run: data(immdata) kmers <- getKmers(immdata$data[[1]], 5) ppm <- kmer_profile(kmers, "prob") vis(ppm, .plot = "text") vis(ppm, .plot = "seq") d <- kmer_profile(c("CASLL", "CASSQ", "CASGL")) vis_textlogo(d) vis_seqlogo(d) ## End(Not run)## Not run: data(immdata) kmers <- getKmers(immdata$data[[1]], 5) ppm <- kmer_profile(kmers, "prob") vis(ppm, .plot = "text") vis(ppm, .plot = "seq") d <- kmer_profile(c("CASLL", "CASSQ", "CASGL")) vis_textlogo(d) vis_seqlogo(d) ## End(Not run)
## S3 method for class 'clonal_family' vis(.data, ...)## S3 method for class 'clonal_family' vis(.data, ...)
.data |
Clonal families from 1 or multiple samples: |
... |
Not used here. |
A ggraph object.
## Not run: data(bcrdata) bcr_data <- bcrdata$data clonal_family <- bcr_data %>% seqCluster(seqDist(bcr_data), .fixed_threshold = 3) %>% repGermline(.threads = 1) %>% repAlignLineage(.min_lineage_sequences = 2, .align_threads = 2, .nofail = TRUE) %>% repClonalFamily(.threads = 1, .nofail = TRUE) %>% vis() ## End(Not run)## Not run: data(bcrdata) bcr_data <- bcrdata$data clonal_family <- bcr_data %>% seqCluster(seqDist(bcr_data), .fixed_threshold = 3) %>% repGermline(.threads = 1) %>% repAlignLineage(.min_lineage_sequences = 2, .align_threads = 2, .nofail = TRUE) %>% repClonalFamily(.threads = 1, .nofail = TRUE) %>% vis() ## End(Not run)
## S3 method for class 'clonal_family_tree' vis(.data, ...)## S3 method for class 'clonal_family_tree' vis(.data, ...)
.data |
Single clonal family tree data from 1 cluster: 1 element from TreeStats column from |
... |
Not used here. |
A ggraph object.
## Not run: data(bcrdata) bcr_data <- bcrdata$data clonal_family <- bcr_data %>% seqCluster(seqDist(bcr_data), .fixed_threshold = 3) %>% repGermline(.threads = 1) %>% repAlignLineage(.min_lineage_sequences = 2, .align_threads = 2, .nofail = TRUE) %>% repClonalFamily(.threads = 1, .nofail = TRUE) # This condition can be omitted; it prevents the example from crashing # when ClustalW or PHYLIP are not installed if (!("step_failure_ignored" %in% class(clonal_family))) { vis(clonal_family[["full_clones"]][["TreeStats"]][[2]]) } ## End(Not run)## Not run: data(bcrdata) bcr_data <- bcrdata$data clonal_family <- bcr_data %>% seqCluster(seqDist(bcr_data), .fixed_threshold = 3) %>% repGermline(.threads = 1) %>% repAlignLineage(.min_lineage_sequences = 2, .align_threads = 2, .nofail = TRUE) %>% repClonalFamily(.threads = 1, .nofail = TRUE) # This condition can be omitted; it prevents the example from crashing # when ClustalW or PHYLIP are not installed if (!("step_failure_ignored" %in% class(clonal_family))) { vis(clonal_family[["full_clones"]][["TreeStats"]][[2]]) } ## End(Not run)
An utility function to visualise the output from repDiversity().
## S3 method for class 'immunr_chao1' vis( .data, .by = NA, .meta = NA, .errorbars = c(0.025, 0.975), .errorbars.off = FALSE, .points = TRUE, .test = TRUE, .signif.label.size = 3.5, ... )## S3 method for class 'immunr_chao1' vis( .data, .by = NA, .meta = NA, .errorbars = c(0.025, 0.975), .errorbars.off = FALSE, .points = TRUE, .test = TRUE, .signif.label.size = 3.5, ... )
.data |
Output from |
.by |
Pass NA if you want to plot samples without grouping. You can pass a character vector with one or several column names from ".meta" to group your data before plotting. In this case you should provide ".meta". You can pass a character vector that exactly matches the number of samples in your data, each value should correspond to a sample's property. It will be used to group data based on the values provided. Note that in this case you should pass NA to ".meta". |
.meta |
A metadata object. An R dataframe with sample names and their properties, such as age, serostatus or hla. |
.errorbars |
A numeric vector of length two with quantiles for error bars on sectors. Disabled if ".errorbars.off" is TRUE. |
.errorbars.off |
If TRUE then plot CI bars for distances between each group. Disabled if no group passed to the ".by" argument. |
.points |
A logical value defining whether points will be visualised or not. |
.test |
A logical vector whether statistical tests should be applied. See "Details" for more information. |
.signif.label.size |
An integer value defining the size of text for p-value. |
... |
Not used here. |
If data is grouped, then statistical tests for comparing means of groups will be performed, unless .test = FALSE is supplied.
In case there are only two groups, the Wilcoxon rank sum test (https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test) is performed
(R function wilcox.test() with an argument exact = FALSE) for testing if there is a difference in mean rank values between two groups.
In case there more than two groups, the Kruskal-Wallis test (https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance) is performed (R function kruskal.test()), that is equivalent to ANOVA for ranks and it tests whether samples from different groups originated from the same distribution.
A significant Kruskal-Wallis test indicates that at least one sample stochastically dominates one other sample.
Adjusted for multiple comparisons P-values are plotted on the top of groups.
P-value adjusting is done using the Holm method (https://en.wikipedia.org/wiki/Holm%E2%80%93Bonferroni_method) (also known as Holm-Bonferroni correction).
You can execute the command ?p.adjust in the R console to see more.
A ggplot2 object.
## Not run: data(immdata) dv <- repDiversity(immdata$data, "chao1") vis(dv) ## End(Not run)## Not run: data(immdata) dv <- repDiversity(immdata$data, "chao1") vis(dv) ## End(Not run)
An utility function to visualise the output from repClonality().
## S3 method for class 'immunr_clonal_prop' vis( .data, .by = NA, .meta = NA, .errorbars = c(0.025, 0.975), .errorbars.off = FALSE, .points = TRUE, .test = TRUE, .signif.label.size = 3.5, ... )## S3 method for class 'immunr_clonal_prop' vis( .data, .by = NA, .meta = NA, .errorbars = c(0.025, 0.975), .errorbars.off = FALSE, .points = TRUE, .test = TRUE, .signif.label.size = 3.5, ... )
.data |
Output from |
.by |
Pass NA if you want to plot samples without grouping. You can pass a character vector with one or several column names from ".meta" to group your data before plotting. In this case you should provide ".meta". You can pass a character vector that exactly matches the number of samples in your data, each value should correspond to a sample's property. It will be used to group data based on the values provided. Note that in this case you should pass NA to ".meta". |
.meta |
A metadata object. An R dataframe with sample names and their properties, such as age, serostatus or hla. |
.errorbars |
A numeric vector of length two with quantiles for error bars on sectors. Disabled if ".errorbars.off" is TRUE. |
.errorbars.off |
If TRUE then plot CI bars for distances between each group. Disabled if no group passed to the ".by" argument. |
.points |
A logical value defining whether points will be visualised or not. |
.test |
A logical vector whether statistical tests should be applied. See "Details" for more information. |
.signif.label.size |
An integer value defining the size of text for p-value. |
... |
Not used here. |
If data is grouped, then statistical tests for comparing means of groups will be performed, unless .test = FALSE is supplied.
In case there are only two groups, the Wilcoxon rank sum test (https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test) is performed
(R function wilcox.test() with an argument exact = FALSE) for testing if there is a difference in mean rank values between two groups.
In case there more than two groups, the Kruskal-Wallis test (https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance) is performed (R function kruskal.test()), that is equivalent to ANOVA for ranks and it tests whether samples from different groups originated from the same distribution.
A significant Kruskal-Wallis test indicates that at least one sample stochastically dominates one other sample.
Adjusted for multiple comparisons P-values are plotted on the top of groups.
P-value adjusting is done using the Holm method (https://en.wikipedia.org/wiki/Holm%E2%80%93Bonferroni_method) (also known as Holm-Bonferroni correction).
You can execute the command ?p.adjust in the R console to see more.
A ggplot2 object.
## Not run: data(immdata) clp <- repClonality(immdata$data, "clonal.prop") vis(clp) hom <- repClonality(immdata$data, "homeo") # Remove p values and points from the plot vis(hom, .by = "Status", .meta = immdata$meta, .test = FALSE, .points = FALSE) ## End(Not run)## Not run: data(immdata) clp <- repClonality(immdata$data, "clonal.prop") vis(clp) hom <- repClonality(immdata$data, "homeo") # Remove p values and points from the plot vis(hom, .by = "Status", .meta = immdata$meta, .test = FALSE, .points = FALSE) ## End(Not run)
## S3 method for class 'immunr_dynamics' vis(.data, .plot = c("smooth", "area", "line"), .order = NA, .log = FALSE, ...)## S3 method for class 'immunr_dynamics' vis(.data, .plot = c("smooth", "area", "line"), .order = NA, .log = FALSE, ...)
.data |
Output from the trackClonotypes function. |
.plot |
Character. Either "smooth", "area" or "line". Each specifies a type of plot for visualisation of clonotype dynamics. |
.order |
Numeric or character vector. Specifies the order to samples, e.g., it used for ordering samples by timepoints. Either See "Examples" below for more details. |
.log |
Logical. If TRUE then use log-scale for the frequency axis. |
... |
Not used here. |
A ggplot2 object.
## Not run: # Load an example data that comes with immunarch data(immdata) # Make the data smaller in order to speed up the examples immdata$data <- immdata$data[c(1, 2, 3, 7, 8, 9)] immdata$meta <- immdata$meta[c(1, 2, 3, 7, 8, 9), ] # Option 1 # Choose the first 10 amino acid clonotype sequences # from the first repertoire to track tc <- trackClonotypes(immdata$data, list(1, 10), .col = "aa") # Choose the first 20 nucleotide clonotype sequences # and their V genes from the "MS1" repertoire to track tc <- trackClonotypes(immdata$data, list("MS1", 20), .col = "nt+v") # Option 2 # Choose clonotypes with amino acid sequences "CASRGLITDTQYF" or "CSASRGSPNEQYF" tc <- trackClonotypes(immdata$data, c("CASRGLITDTQYF", "CSASRGSPNEQYF"), .col = "aa") # Option 3 # Choose the first 10 clonotypes from the first repertoire # with amino acid sequences and V segments target <- immdata$data[[1]] %>% select(CDR3.aa, V.name) %>% head(10) tc <- trackClonotypes(immdata$data, target) # Visualise the output regardless of the chosen option # Therea are three way to visualise it, regulated by the .plot argument vis(tc, .plot = "smooth") vis(tc, .plot = "area") vis(tc, .plot = "line") # Visualising timepoints # First, we create an additional column in the metadata with randomly choosen timepoints: immdata$meta$Timepoint <- sample(1:length(immdata$data)) immdata$meta # Next, we create a vector with samples in the right order, # according to the "Timepoint" column (from smallest to greatest): sample_order <- order(immdata$meta$Timepoint) # Sanity check: timepoints are following the right order: immdata$meta$Timepoint[sample_order] # Samples, sorted by the timepoints: immdata$meta$Sample[sample_order] # And finally, we visualise the data: vis(tc, .order = sample_order) ## End(Not run)## Not run: # Load an example data that comes with immunarch data(immdata) # Make the data smaller in order to speed up the examples immdata$data <- immdata$data[c(1, 2, 3, 7, 8, 9)] immdata$meta <- immdata$meta[c(1, 2, 3, 7, 8, 9), ] # Option 1 # Choose the first 10 amino acid clonotype sequences # from the first repertoire to track tc <- trackClonotypes(immdata$data, list(1, 10), .col = "aa") # Choose the first 20 nucleotide clonotype sequences # and their V genes from the "MS1" repertoire to track tc <- trackClonotypes(immdata$data, list("MS1", 20), .col = "nt+v") # Option 2 # Choose clonotypes with amino acid sequences "CASRGLITDTQYF" or "CSASRGSPNEQYF" tc <- trackClonotypes(immdata$data, c("CASRGLITDTQYF", "CSASRGSPNEQYF"), .col = "aa") # Option 3 # Choose the first 10 clonotypes from the first repertoire # with amino acid sequences and V segments target <- immdata$data[[1]] %>% select(CDR3.aa, V.name) %>% head(10) tc <- trackClonotypes(immdata$data, target) # Visualise the output regardless of the chosen option # Therea are three way to visualise it, regulated by the .plot argument vis(tc, .plot = "smooth") vis(tc, .plot = "area") vis(tc, .plot = "line") # Visualising timepoints # First, we create an additional column in the metadata with randomly choosen timepoints: immdata$meta$Timepoint <- sample(1:length(immdata$data)) immdata$meta # Next, we create a vector with samples in the right order, # according to the "Timepoint" column (from smallest to greatest): sample_order <- order(immdata$meta$Timepoint) # Sanity check: timepoints are following the right order: immdata$meta$Timepoint[sample_order] # Samples, sorted by the timepoints: immdata$meta$Sample[sample_order] # And finally, we visualise the data: vis(tc, .order = sample_order) ## End(Not run)
An utility function to visualise the output from repExplore().
## S3 method for class 'immunr_exp_vol' vis( .data, .by = NA, .meta = NA, .errorbars = c(0.025, 0.975), .errorbars.off = FALSE, .points = TRUE, .test = TRUE, .signif.label.size = 3.5, ... )## S3 method for class 'immunr_exp_vol' vis( .data, .by = NA, .meta = NA, .errorbars = c(0.025, 0.975), .errorbars.off = FALSE, .points = TRUE, .test = TRUE, .signif.label.size = 3.5, ... )
.data |
Output from |
.by |
Pass NA if you want to plot samples without grouping. You can pass a character vector with one or several column names from ".meta" to group your data before plotting. In this case you should provide ".meta". You can pass a character vector that exactly matches the number of samples in your data, each value should correspond to a sample's property. It will be used to group data based on the values provided. Note that in this case you should pass NA to ".meta". |
.meta |
A metadata object. An R dataframe with sample names and their properties, such as age, serostatus or hla. |
.errorbars |
A numeric vector of length two with quantiles for error bars on sectors. Disabled if ".errorbars.off" is TRUE. |
.errorbars.off |
If TRUE then plot CI bars for distances between each group. Disabled if no group passed to the ".by" argument. |
.points |
A logical value defining whether points will be visualised or not. |
.test |
A logical vector whether statistical tests should be applied. See "Details" for more information. |
.signif.label.size |
An integer value defining the size of text for p-value. |
... |
Not used here. |
If data is grouped, then statistical tests for comparing means of groups will be performed, unless .test = FALSE is supplied.
In case there are only two groups, the Wilcoxon rank sum test (https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test) is performed
(R function wilcox.test() with an argument exact = FALSE) for testing if there is a difference in mean rank values between two groups.
In case there more than two groups, the Kruskal-Wallis test (https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance) is performed (R function kruskal.test()), that is equivalent to ANOVA for ranks and it tests whether samples from different groups originated from the same distribution.
A significant Kruskal-Wallis test indicates that at least one sample stochastically dominates one other sample.
Adjusted for multiple comparisons P-values are plotted on the top of groups.
P-value adjusting is done using the Holm method (https://en.wikipedia.org/wiki/Holm%E2%80%93Bonferroni_method) (also known as Holm-Bonferroni correction).
You can execute the command ?p.adjust in the R console to see more.
A ggplot2 object.
## Not run: data(immdata) repExplore(immdata$data, "volume") %>% vis() repExplore(immdata$data, "count") %>% vis() repExplore(immdata$data, "len") %>% vis() repExplore(immdata$data, "clones") %>% vis() ## End(Not run)## Not run: data(immdata) repExplore(immdata$data, "volume") %>% vis() repExplore(immdata$data, "count") %>% vis() repExplore(immdata$data, "len") %>% vis() repExplore(immdata$data, "clones") %>% vis() ## End(Not run)
Visualise distributions of genes using heatmaps or other plots.
## S3 method for class 'immunr_gene_usage' vis(.data, .plot = c("hist", "box", "heatmap", "heatmap2", "circos"), ...)## S3 method for class 'immunr_gene_usage' vis(.data, .plot = c("hist", "box", "heatmap", "heatmap2", "circos"), ...)
.data |
Output from the geneUsage function. |
.plot |
String specifying the plot type:
|
... |
Other arguments passed to corresponding functions depending on the plot type:
|
A ggplot2 object, pheatmap or circlize object.
## Not run: data(immdata) gu <- geneUsage(immdata$data[[1]]) vis(gu) gu <- geneUsage(immdata$data) vis(gu, .by = "Status", .meta = immdata$meta) vis(gu, "box", .by = "Status", .meta = immdata$meta) ## End(Not run)## Not run: data(immdata) gu <- geneUsage(immdata$data[[1]]) vis(gu) gu <- geneUsage(immdata$data) vis(gu, .by = "Status", .meta = immdata$meta) vis(gu, "box", .by = "Status", .meta = immdata$meta) ## End(Not run)
Visualisation of the results of hierarchical clustering. For other clustering visualisations see vis.immunr_kmeans.
## S3 method for class 'immunr_hclust' vis(.data, .rect = FALSE, .plot = c("clust", "best"), ...)## S3 method for class 'immunr_hclust' vis(.data, .rect = FALSE, .plot = c("clust", "best"), ...)
.data |
Clustering results from repOverlapAnalysis or geneUsageAnalysis. |
.rect |
Passed to factoextra::fviz_dend - whether to add a rectangle around groups. |
.plot |
A character vector of length one or two specifying which plots to visualise. If "clust" then plot only the clustering. If "best" then plot the number of optimal clusters. If both then plot both. |
... |
Not used here. |
Ggplot2 objects inside the patchwork container.
vis, repOverlapAnalysis, geneUsageAnalysis
## Not run: data(immdata) ov <- repOverlap(immdata$data) repOverlapAnalysis(ov, "mds+hclust") %>% vis() ## End(Not run)## Not run: data(immdata) ov <- repOverlap(immdata$data) repOverlapAnalysis(ov, "mds+hclust") %>% vis() ## End(Not run)
## S3 method for class 'immunr_inc_overlap' vis(.data, .target = 1, .grid = FALSE, .ncol = 2, ...)## S3 method for class 'immunr_inc_overlap' vis(.data, .target = 1, .grid = FALSE, .ncol = 2, ...)
.data |
Output from the repOverlap function that uses "top" methods. |
.target |
Index of a repertoire to plot. Omitted if .grid is TRUE. |
.grid |
Logical. If TRUE then plot all similarities in a grid. |
.ncol |
Numeric. Number of columns in the resulting grid. |
... |
Not used here. |
A ggplot2 object.
## Not run: data(immdata) tmp <- repOverlap(immdata$data[1:4], "inc+overlap", .verbose.inc = FALSE, .verbose = FALSE) vis(tmp, .target = 1) vis(tmp, .grid = TRUE) ## End(Not run)## Not run: data(immdata) tmp <- repOverlap(immdata$data[1:4], "inc+overlap", .verbose.inc = FALSE, .verbose = FALSE) vis(tmp, .target = 1) vis(tmp, .grid = TRUE) ## End(Not run)
Visualisation of the results of K-means and DBSCAN clustering. For hierarhical clustering visualisations see vis.immunr_hclust.
## S3 method for class 'immunr_kmeans' vis( .data, .point = TRUE, .text = TRUE, .ellipse = TRUE, .point.size = 2, .text.size = 10, .plot = c("clust", "best"), ... )## S3 method for class 'immunr_kmeans' vis( .data, .point = TRUE, .text = TRUE, .ellipse = TRUE, .point.size = 2, .text.size = 10, .plot = c("clust", "best"), ... )
.data |
Clustering results from repOverlapAnalysis or geneUsageAnalysis. |
.point |
If TRUE then plot sample points. Passed to factoextra::fviz_cluster. |
.text |
If TRUE then plot text labels. Passed to factoextra::fviz_cluster. |
.ellipse |
If TRUE then plot ellipses around all samples. Passed to "ellipse" from factoextra::fviz_cluster. |
.point.size |
Size of points, passed to "pointsize" from factoextra::fviz_cluster. |
.text.size |
Size of text labels, passed to labelsize from factoextra::fviz_cluster. |
.plot |
A character vector of length one or two specifying which plots to visualise. If "clust" then plot only the clustering. If "best" then plot the number of optimal clusters. If both then plot both. |
... |
Not used here. |
Ggplot2 objects inside the pathwork container.
vis, repOverlapAnalysis, geneUsageAnalysis
## Not run: data(immdata) ov <- repOverlap(immdata$data) repOverlapAnalysis(ov, "mds+kmeans") %>% vis() ## End(Not run)## Not run: data(immdata) ov <- repOverlap(immdata$data) repOverlapAnalysis(ov, "mds+kmeans") %>% vis() ## End(Not run)
Plot a distribution (bar plot) of the most frequent kmers in a data.
## S3 method for class 'immunr_kmer_table' vis( .data, .head = 100, .position = c("stack", "dodge", "fill"), .log = FALSE, ... )## S3 method for class 'immunr_kmer_table' vis( .data, .head = 100, .position = c("stack", "dodge", "fill"), .log = FALSE, ... )
.data |
Data frame with two columns "Kmers" and "Count" or a list with such data frames. See Examples. |
.head |
Number of the most frequent kmers to choose for plotting from each data frame. |
.position |
Character vector of length 1. Position of bars for each kmers. Value for the |
.log |
Logical. If TRUE then plot log-scaled plots. |
... |
Not used here. |
A ggplot2 object.
get.kmers
## Not run: # Load necessary data and package. data(immdata) # Get 5-mers. imm.km <- getKmers(immdata$data[[1]], 5) # Plots for kmer proportions in each data frame in immdata. p1 <- vis(imm.km, .position = "stack") p2 <- vis(imm.km, .position = "fill") p1 + p2 ## End(Not run)## Not run: # Load necessary data and package. data(immdata) # Get 5-mers. imm.km <- getKmers(immdata$data[[1]], 5) # Plots for kmer proportions in each data frame in immdata. p1 <- vis(imm.km, .position = "stack") p2 <- vis(imm.km, .position = "fill") p1 + p2 ## End(Not run)
## S3 method for class 'immunr_mds' vis( .data, .by = NA, .meta = NA, .point = TRUE, .text = TRUE, .ellipse = TRUE, .point.size = 2, .text.size = 4, ... )## S3 method for class 'immunr_mds' vis( .data, .by = NA, .meta = NA, .point = TRUE, .text = TRUE, .ellipse = TRUE, .point.size = 2, .text.size = 4, ... )
.data |
Output from analysis functions such as geneUsageAnalysis or immunr_pca, immunr_mds or immunr_tsne. |
.by |
Pass NA if you want to plot samples without grouping. You can pass a character vector with one or several column names from ".meta" to group your data before plotting. In this case you should provide ".meta". You can pass a character vector that exactly matches the number of samples in your data, each value should correspond to a sample's property. It will be used to group data based on the values provided. Note that in this case you should pass NA to ".meta". |
.meta |
A metadata object. An R dataframe with sample names and their properties, such as age, serostatus or hla. |
.point |
Logical. If TRUE then plot points corresponding to objects. |
.text |
Logical. If TRUE then plot sample names. |
.ellipse |
Logical. If TRUE then plot ellipses around clusters of grouped samples. |
.point.size |
Numeric. A size of points to plot. |
.text.size |
Numeric. A size of sample names' labels. |
... |
Not used here. |
Other visualisation methods:
PCA - vis.immunr_pca
MDS - vis.immunr_mds
tSNE - vis.immunr_tsne
A ggplot2 object.
## Not run: data(immdata) ov <- repOverlap(immdata$data) repOverlapAnalysis(ov, "mds") %>% vis() ## End(Not run)## Not run: data(immdata) ov <- repOverlap(immdata$data) repOverlapAnalysis(ov, "mds") %>% vis() ## End(Not run)
Visualises matrices with overlap values or gene usage distances among samples. For details see the links below.
## S3 method for class 'immunr_ov_matrix' vis(.data, .plot = c("heatmap", "heatmap2", "circos"), ...)## S3 method for class 'immunr_ov_matrix' vis(.data, .plot = c("heatmap", "heatmap2", "circos"), ...)
.data |
Output from repOverlap or geneUsageAnalysis. |
.plot |
A string specifying the plot type:
|
... |
Other arguments are passed through to the underlying plotting function:
|
A ggplot2, pheatmap or circlize object.
## Not run: data(immdata) ov <- repOverlap(immdata$data) vis(ov) vis(ov, "heatmap") vis(ov, "heatmap2") vis(ov, "circos") ## End(Not run)## Not run: data(immdata) ov <- repOverlap(immdata$data) vis(ov) vis(ov, "heatmap") vis(ov, "heatmap2") vis(ov, "circos") ## End(Not run)
## S3 method for class 'immunr_public_repertoire' vis(.data, .plot = c("freq", "clonotypes"), ...)## S3 method for class 'immunr_public_repertoire' vis(.data, .plot = c("freq", "clonotypes"), ...)
.data |
Public repertoire, an output from pubRep. |
.plot |
A string specifying the plot type:
|
... |
Further arguments passed vis_public_frequencies or vis_public_clonotypes, depending on the ".plot" argument. |
A ggplot2 object.
## Not run: data(immdata) immdata$data <- lapply(immdata$data, head, 300) pr <- pubRep(immdata$data, .verbose = FALSE) vis(pr, "freq") vis(pr, "freq", .type = "none") vis(pr, "clonotypes", 1, 2) ## End(Not run)## Not run: data(immdata) immdata$data <- lapply(immdata$data, head, 300) pr <- pubRep(immdata$data, .verbose = FALSE) vis(pr, "freq") vis(pr, "freq", .type = "none") vis(pr, "clonotypes", 1, 2) ## End(Not run)
Visualise public clonotype frequencies.
## S3 method for class 'immunr_public_statistics' vis(.data, ...)## S3 method for class 'immunr_public_statistics' vis(.data, ...)
.data |
Public repertoire - an output from the pubRep function. |
... |
Other arguments passsed directly to UpSetR::upset. |
A ggplot2 object.
## Not run: data(immdata) immdata$data <- lapply(immdata$data, head, 2000) pr <- pubRep(immdata$data, .verbose = FALSE) pubRepStatistics(pr) %>% vis() ## End(Not run)## Not run: data(immdata) immdata$data <- lapply(immdata$data, head, 2000) pr <- pubRep(immdata$data, .verbose = FALSE) pubRepStatistics(pr) %>% vis() ## End(Not run)
## S3 method for class 'step_failure_ignored' vis(.data, ...)## S3 method for class 'step_failure_ignored' vis(.data, ...)
.data |
Not used here. |
... |
Not used here. |
An empty object with "step_failure_ignored" class.