This function removes entries mapping to proteins that are identified by less than n_peptides. Removing single-hit proteins from an analysis can significantly increase the sensitivity under strict protein fdr criteria, as evaluated by e.g. assess_fdr_overall.

filter_on_min_peptides(
  data,
  n_peptides,
  protein_col = "ProteinName",
  peptide_col = c("Peptide.Sequence", "FullPeptideName"),
  rm.decoy = TRUE
)

Arguments

data

Data table that is produced by the OpenSWATH/iPortal workflow.

n_peptides

Number of minimal number of peptide IDs associated with a protein ID in order to be kept in the dataset.

protein_col

Column with protein identifiers. Default: ProteinName

peptide_col

Column with peptide identifiers. Default: Peptide.Sequence or FullPeptideName

rm.decoy

Option to remove the decoys during filtering.

Value

Returns the filtered data frame with only peptides that map to proteins with >= n_peptides peptides.

Author

Moritz Heusel, Peter Blattmann

Examples

{
 data("OpenSWATH_data", package="SWATH2stats")
 data("Study_design", package="SWATH2stats")
 data <- sample_annotation(OpenSWATH_data, Study_design)
 data.filtered <- filter_mscore_freqobs(data, 0.01,0.8)
 data.max <- filter_on_max_peptides(data.filtered, 5)
 data.min.max <- filter_on_min_peptides(data.max, 3)
 }
#> Treshold, peptides need to have been quantified in more conditions than: 4.8
#> Fraction of peptides selected: 0.42
#> Dimension difference: 1323, 0
#> Before filtering: 
#>   Number of proteins: 10
#>   Number of peptides: 133
#> 
#> Percentage of peptides removed: 69.17%
#> 
#> After filtering: 
#>   Number of proteins: 10
#>   Number of peptides: 41
#> Before filtering: 
#>   Number of proteins: 10
#>   Number of peptides: 41
#> 
#> Percentage of peptides removed: 7.32%
#> 
#> After filtering: 
#>   Number of proteins: 8
#>   Number of peptides: 38