Background
Molecular Indexing of Proteins by Self-Assembly (MIPSA) is a novel and
proprietary method to construct DNA-barcoded peptide and protein
libraries for antibody reactome profiling applications at Infinity
Bio.
Antibody reactome profiling is the comprehensive analysis of antibody
binding (“antibody reactivity”) to a large set of specific antigens
(typically > 10,000). Antibodies, also known as immunoglobulins (Ig),
are proteins produced by the immune system in response to the presence
of foreign substances, such as pathogens, allergens, or even an
individual’s own proteins in the case of autoimmune diseases. Antibody
profiling can be performed on monoclonal or polyclonal antibodies,
including the highly complex mixture of antibodies present in biological
fluids like serum or plasma. With MIPSA, antibody reactivities are
detected via immunocapture and high-throughput sequencing of
antigen-conjugated DNA barcodes. The resulting sequencing data is then
processed through Infinity Bio’s bioinformatics analysis pipeline to
detect the reactive antibodies.
This report details the reactivities detected towards viral proteins
in your samples using Infinity Bio’s VirSIGHT MIPSA library. This
library consists of 286,793 peptides designed to represent 94,466 unique
protein targets spanning all taxa of viruses currently known to infect
human cells.
Samples are processed in 96-well plate format. The 12th column is
reserved for Infinity Bio’s internal controls, including 3 positive and
negative library-specific serum controls, and 5 PBS (phosphate buffered
saline) input “mock IP” or “beads only” controls that are used for
fold-over-background downstream calculations. Each sample, including the
serum controls and each mock IP control, is individually compared
against the complete set of mock IPs associated with the project. This
comparison involves the use of the EdgeR software package for data
normalization, statistical testing and fold-over-background estimation.
For each sample, “hits” are defined as library members with FDR-adjusted
p-values < 0.05 to reject the null-hypothesis that the data is drawn
from the background distribution.
To ensure the quality of each data set, we confirm the following
control performance metrics are achieved prior to data release:
- The mapped barcode reads per sample must be at least 10-times the
library complexity for at least 90% of the project samples.
- The average Pearson correlation of -log10 FDR-adjusted p-values
among the same internal serum controls must average at least 0.90 or
higher.
- The average number of false positives detected per mock IP must be
less than 1.
A report of your project QC metrics and a summary of the detected
antibody reactivities for your project are detailed below.
Project QC
Depths/Coverage Summary
Figure 1. Bar
plot of number of aligned barcode read counts per sample. Samples are
colored by sample type, including project samples (smpl), mock IP
samples (ctrlPbs) and serum control samples (ctrlSmpl). Samples are
arranged in ascending order by aligned reads and faceted by sample type.
Horizontal dashed line indicates number of reads required for 10-times
the library complexity.
Mean Aligned Reads
|
5562181
|
Mean Sample Coverage
|
19.39
|
Positive Control Correlations
Figure 2.
Correlation heat map of all serum control samples (ctrlSmpl). Heat map
shows Pearson correlation values (-1 to 1) of -log10 FDR-adjusted
p-values for each serum control versus every other serum control. Serum
control samples are hierarchically clustered by similarity as determined
by Pearson correlation. The average Pearson correlation among the same
internal serum controls must average at least 0.90 or higher. If no
serum controls of this type were run for the project, an Infinity Bio
logo will appear instead.
Data Overview
Reactivities Per Sample
Figure 3. Bar
plot of number of hits (reactivities) detected per sample. Samples are
colored by sample type, including project samples (smpl), mock IP
background controls (ctrlPbs) and serum control samples (ctrlSmpl).
Samples are arranged in ascending order by detected hits and faceted by
sample type.
Top Reactivities Across
Samples
Top 20 Hits (Peptides)
Figure 4. Heat
map of top 20 reactivities across samples at the peptide level. Sum Hits
Fold Over Background values were calculated for each peptide found in
project samples (excluding controls), and the top 20 were retained for
plotting. Rows (peptides) and columns (samples) were hierarchically
clustered using Euclidean distance. If no non-control samples were run
for the project, an Infinity Bio logo will appear instead.
Navigating Your Data Return
Your data delivery package contains this project-level report, a
sub-directory of per-sample reports, and a set of data table outputs. If
applicable, your data will be separated into sub-folders by library.
Each folder contains this project report, which includes all the
definitions and terms relevant to your data. We suggest reviewing the
data in the following order:
- Infinity Bio_IB0000_VirSIGHT_Report.html: this file, to be
opened in an internet browser, is the project-level report which
provides QC information, overview reporting and analysis
terminology.
- IB0000_VirSIGHT_pvals-Adjusted.tsv: this file provides the
EdgeR-derived statistical comparison of each library member’s barcode
counts in a sample to that from the mock IP controls. P-values are FDR
corrected (using the Benjamini-Hochberg procedure) and reported as
-log10(FDR-adjusted p-values). Additionally, these values are appended
with signs (+/-) indicating whether the library member was enriched in
the sample relative to the mock IP controls. Positive values indicate
enrichment in the sample. We also provide the corresponding
IB0000_VirSIGHT_pvals-Unadjusted.tsv file if you would like the
unadjusted p-values.
- IB0000_VirSIGHT_Hits_Fold-Over-Background.tsv: this file
provides the EdgeR-normalized fold-over-background value of each library
member’s barcode count in a sample versus that from the mock IP
controls, reporting only values from the hits (as determined in the
IB0000_VirSIGHT_pvals-Adjusted.tsv file). Non-hit
fold-over-background values are reported as “0”. We also provide the
full set of fold-over-background values in the corresponding
IB0000_VirSIGHT_Fold-Over-Background.tsv file.
- IB0000_VirSIGHT_Counts.tsv: this file provides the raw
barcode counts for each peptide in each sample, which is used for
downstream statistical analyses. This file may not be useful for you,
but it is provided in case you are curious.
- IB0000_VirSIGHT_Project-Summary.csv: this file provides the
QC data that are plotted in the project-level report.
- Individual sample reports are also provided for your convenience,
although all this information is also captured in the files mentioned
above.
A note about comparing significance (p) values versus
fold-over-background values. The relative input abundance of each
library member determines the confidence with which reactivities can be
detected. Significance values of a given library member can be compared
across samples, but significance values of two library members with
differing input abundance may not be comparable within a sample.
Fold-change values of two library members with differing input
abundances are comparable within a sample but might be discordant across
samples if the adjusted p-values are near the 0.05 significance
threshold. Despite these considerations, the vast majority of
significance values are well correlated with fold-over-background
values, and so in most cases can these values be used relatively
interchangeably.
Please review this document and the related files and then schedule a
meeting with our team to answer any questions you might have. In the
meantime, please let us know if you have any issues – we are always here
to help!
Methods
For antibody profiling projects, samples are first accessioned into
Infinity Bio’s sample tracking system. Samples are then mixed with the
VirSIGHT MIPSA reagent, which is comprised of 20-63 amino acid long
peptides, each with a set of unique DNA barcodes. Antibodies in samples
bind to antigens in the library, and antibodies and their bound members
are captured via Dyna magnetic beads coated with protein A/G. Barcodes
on captured antigen library members are amplified via PCR, and PCR
products are submitted for sequencing on an Illumina platform.
Once sequencing results are obtained, FASTQ reads for each individual
sample are run through the Infinity Bio bioinformatics pipeline to
process and compare each sample in the following ways:
- Reads for each sample are mapped using Bowtie2 to a dictionary of mono-associated
barcode-peptide pairs, which is filtered to contain only
sequence-validated library members specific to the VirSIGHT
library.
- Each sample is statistically compared to the set of mock IP controls
using the R package edgeR. Then, detected peptide reactivities (“hits”)
are determined for each sample using the criterion that EdgeR-derived
adjusted p-value (FDR) to reject the null-hypothesis that the sample
read count is drawn from the mock IP background distribution must be
< 0.05.
Terminology
- Depth: number of mapped barcode sequencing reads
produced via Illumina sequencing for each sample in a project.
- Counts: integer values for the number of unique
barcodes detected from each member of the VirSIGHT library for each
sample.
- Coverage: the average number of times each peptide
library member of the VirSIGHT library is sequenced in a sample. This is
calculated as Num Aligned Reads/Num Library Peptide Members for
each sample. We require a coverage of at least 10 in at least 90% of the
project samples.
- Mock IP control: an assay reaction that includes
all input reagents but with PBS in place of sample, which undergoes all
steps of the assay process alongside test sample reactions. Designated
as “ctrlPbs” in data outputs.
- Hit: a peptide library member that was determined
to be significantly enriched (FDR-adjusted p-value < 0.05) in a given
sample compared to the set of mock IPs. Also referred to as a reactive
peptide.
- Fold Over Background: numerical value calculated
via EdgeR indicating the magnitude of difference in counts for a given
library member detected in a sample compared to the set of mock IPs
associated with the project.
- Hits Fold Over Background: fold over background
values for each library member that was determined to be a hit in a
given sample. All non-hit values are set to 0.
Infinity Bio, Inc. © 2024