R/filter_on_min_peptides.R
filter_on_min_peptides.Rd
This function removes entries mapping to proteins that are identified by less than n_peptides. Removing single-hit proteins from an analysis can significantly increase the sensitivity under strict protein fdr criteria, as evaluated by e.g. assess_fdr_overall.
filter_on_min_peptides(
data,
n_peptides,
protein_col = "ProteinName",
peptide_col = c("Peptide.Sequence", "FullPeptideName"),
rm.decoy = TRUE
)
Data table that is produced by the OpenSWATH/iPortal workflow.
Number of minimal number of peptide IDs associated with a protein ID in order to be kept in the dataset.
Column with protein identifiers. Default: ProteinName
Column with peptide identifiers. Default: Peptide.Sequence or FullPeptideName
Option to remove the decoys during filtering.
Returns the filtered data frame with only peptides that map to proteins with >= n_peptides peptides.
{
data("OpenSWATH_data", package="SWATH2stats")
data("Study_design", package="SWATH2stats")
data <- sample_annotation(OpenSWATH_data, Study_design)
data.filtered <- filter_mscore_freqobs(data, 0.01,0.8)
data.max <- filter_on_max_peptides(data.filtered, 5)
data.min.max <- filter_on_min_peptides(data.max, 3)
}
#> Treshold, peptides need to have been quantified in more conditions than: 4.8
#> Fraction of peptides selected: 0.42
#> Dimension difference: 1323, 0
#> Before filtering:
#> Number of proteins: 10
#> Number of peptides: 133
#>
#> Percentage of peptides removed: 69.17%
#>
#> After filtering:
#> Number of proteins: 10
#> Number of peptides: 41
#> Before filtering:
#> Number of proteins: 10
#> Number of peptides: 41
#>
#> Percentage of peptides removed: 7.32%
#>
#> After filtering:
#> Number of proteins: 8
#> Number of peptides: 38