This functions transforms the column names from a data frame from another format to a data frame with column names used by the OpenSWATH output and required for these functions. During executing of the function the corresponding columns for each column in the data need to be selected. For columns that do not corresond to a certain column 'not applicable' needs to be selected and the column names are not changed.

import_data(data)

Arguments

data

A data frame containing the SWATH-MS data (one line per peptide precursor quantified) but with different column names.

Value

Returns the data frame in the appropriate format.

Note

List of column names of the OpenSWATH data: ProteinName: Unique identifier for protein or proteingroup that the peptide maps to. Proteotypic peptides should be indicated by 1/ in order to be recognized as such by the function filter_proteotypic_peptides. FullPeptideName: Unique identifier for the peptide. Charge: Charge of the peptide precursor ion quantified. Sequence: Naked peptide sequence without modifications. aggr_Fragment_Annotation: aggregated annotation for the different Fragments quantified for this peptide. In the OpenSWATH results the different annotation in OpenSWATH are concatenated by a semicolon. aggr_Peak_Area: aggregated Intensity values for the different Fragments quantified for this peptide. In the OpenSWATH results the aggregated Peak Area intensities are concatenated by a semicolon. transition_group_id: A unique identifier for each transition group used. decoy: Indicating with 1 or 0 if this transition group is a decoy. m_score: Column containing the score that is used to estimate FDR or filter. M-score values of identified peak groups are equivalent to a q-value and thus typically are smaller than 0.01, depending on the confidence of identification (the lower the m-score, the higher the confidence). Column containing the score that is used to estimate FDR or filter. RT: Column containing the retention time of the quantified peak. filename: Column containing the filename or a unique identifier for each injection. Intensity: column containing the intensity value for each quantified peptide. Columns needed for FDR estimation and filtering functions: ProteinName, FullPeptideName, transition_group_id, decoy, m_score Columns needed for conversion to transition-level format (needed for MSStats and mapDIA input): aggr_Fragment_Annotation, aggr_Peak_Are

Author

Peter Blattmann

Examples

 data('Spyogenes', package = 'SWATH2stats')
 head(data)
#>                     ProteinName FullPeptideName Charge
#> 1 Spyo_Exp3652_DDB_SeqID_520043     TTLHQAILMGR      3
#> 2 Spyo_Exp3652_DDB_SeqID_515468         SVLEELK      2
#> 3 Spyo_Exp3652_DDB_SeqID_325989  VIGVGGGGGNAINR      3
#> 4 Spyo_Exp3652_DDB_SeqID_515305      FYDPGHVMLK      3
#> 5 Spyo_Exp3652_DDB_SeqID_325124     ATDDAIKEIDR      3
#> 6 Spyo_Exp3652_DDB_SeqID_520062      YHSGDYVFVK      3
#>                                                                                                                                                 aggr_Fragment_Annotation
#> 1                            58421_TTLHQAILMGR/3_y4;58422_TTLHQAILMGR/3_y3;58423_TTLHQAILMGR/3_b6;58424_TTLHQAILMGR/3_y9_2;58425_TTLHQAILMGR/3_y7;58426_TTLHQAILMGR/3_b7
#> 2                                                      58499_SVLEELK/2_y5;58500_SVLEELK/2_y3;58501_SVLEELK/2_y4;58502_SVLEELK/2_y3;58503_SVLEELK/2_y4;58504_SVLEELK/2_y3
#> 3 58595_VIGVGGGGGNAINR/3_b13+18_2;58596_VIGVGGGGGNAINR/3_y4;58597_VIGVGGGGGNAINR/3_y10;58598_VIGVGGGGGNAINR/3_y12_2;58599_VIGVGGGGGNAINR/3_b12;58600_VIGVGGGGGNAINR/3_b7
#> 4                              58955_FYDPGHVMLK/3_y7_2;58956_FYDPGHVMLK/3_y8_2;58957_FYDPGHVMLK/3_y7;58958_FYDPGHVMLK/3_b3;58959_FYDPGHVMLK/3_y9_2;58960_FYDPGHVMLK/3_b2
#> 5                       59453_ATDDAIKEIDR/3_y9_2;59454_ATDDAIKEIDR/3_y10_2;59455_ATDDAIKEIDR/3_y5;59456_ATDDAIKEIDR/3_y4;59457_ATDDAIKEIDR/3_y6;59458_ATDDAIKEIDR/3_y8_2
#> 6                                    59657_YHSGDYVFVK/3_y3;59658_YHSGDYVFVK/3_b6;59659_YHSGDYVFVK/3_b2;59660_YHSGDYVFVK/3_b5;59661_YHSGDYVFVK/3_y4;59662_YHSGDYVFVK/3_a6
#>                                                                  aggr_Peak_Area
#> 1         3939.000000;3895.000000;1580.000000;770.000000;1101.000000;730.000000
#> 2      11139.000000;1968.000000;1632.000000;1975.000000;1001.000000;1913.000000
#> 3       3275.000000;8378.000000;3175.000000;3392.000000;1804.000000;1933.000000
#> 4      32275.000000;4911.000000;3415.000000;3550.000000;4480.000000;3413.000000
#> 5 65727.000000;45606.000000;45401.000000;34751.000000;13926.000000;13518.000000
#> 6          1979.000000;1204.000000;1502.000000;810.000000;670.000000;160.000000
#>           transition_group_id decoy      m_score       RT
#> 1    10094_TTLHQAILMGR/3_run0 FALSE 1.388132e-03 2764.405
#> 2        10107_SVLEELK/2_run0 FALSE 1.365587e-05 2785.501
#> 3 10123_VIGVGGGGGNAINR/3_run0 FALSE 4.066284e-04 2150.922
#> 4     10185_FYDPGHVMLK/3_run0 FALSE 5.755066e-05 3056.339
#> 5    10269_ATDDAIKEIDR/3_run0 FALSE 1.073568e-07 2160.130
#> 6     10303_YHSGDYVFVK/3_run0 FALSE 1.607579e-06 2628.328
#>                                                                        align_origfilename
#> 1 /media/data/tmp/strep_align/Strep0_Repl1_R02/split_hroest_K120808_all_peakgroups.xls.gz
#> 2 /media/data/tmp/strep_align/Strep0_Repl1_R02/split_hroest_K120808_all_peakgroups.xls.gz
#> 3 /media/data/tmp/strep_align/Strep0_Repl1_R02/split_hroest_K120808_all_peakgroups.xls.gz
#> 4 /media/data/tmp/strep_align/Strep0_Repl1_R02/split_hroest_K120808_all_peakgroups.xls.gz
#> 5 /media/data/tmp/strep_align/Strep0_Repl1_R02/split_hroest_K120808_all_peakgroups.xls.gz
#> 6 /media/data/tmp/strep_align/Strep0_Repl1_R02/split_hroest_K120808_all_peakgroups.xls.gz
#>   Intensity       Sequence   delta_rt
#> 1     12015    TTLHQAILMGR 21.6801501
#> 2     19628        SVLEELK  4.8352308
#> 3     21957 VIGVGGGGGNAINR 18.6908854
#> 4     52044     FYDPGHVMLK -3.7049072
#> 5    218929    ATDDAIKEIDR  0.3056915
#> 6      6325     YHSGDYVFVK  6.3220861
 str(data)
#> 'data.frame':	38272 obs. of  13 variables:
#>  $ ProteinName             : chr  "Spyo_Exp3652_DDB_SeqID_520043" "Spyo_Exp3652_DDB_SeqID_515468" "Spyo_Exp3652_DDB_SeqID_325989" "Spyo_Exp3652_DDB_SeqID_515305" ...
#>  $ FullPeptideName         : chr  "TTLHQAILMGR" "SVLEELK" "VIGVGGGGGNAINR" "FYDPGHVMLK" ...
#>  $ Charge                  : int  3 2 3 3 3 3 3 3 2 3 ...
#>  $ aggr_Fragment_Annotation: chr  "58421_TTLHQAILMGR/3_y4;58422_TTLHQAILMGR/3_y3;58423_TTLHQAILMGR/3_b6;58424_TTLHQAILMGR/3_y9_2;58425_TTLHQAILMGR"| __truncated__ "58499_SVLEELK/2_y5;58500_SVLEELK/2_y3;58501_SVLEELK/2_y4;58502_SVLEELK/2_y3;58503_SVLEELK/2_y4;58504_SVLEELK/2_y3" "58595_VIGVGGGGGNAINR/3_b13+18_2;58596_VIGVGGGGGNAINR/3_y4;58597_VIGVGGGGGNAINR/3_y10;58598_VIGVGGGGGNAINR/3_y12"| __truncated__ "58955_FYDPGHVMLK/3_y7_2;58956_FYDPGHVMLK/3_y8_2;58957_FYDPGHVMLK/3_y7;58958_FYDPGHVMLK/3_b3;58959_FYDPGHVMLK/3_"| __truncated__ ...
#>  $ aggr_Peak_Area          : chr  "3939.000000;3895.000000;1580.000000;770.000000;1101.000000;730.000000" "11139.000000;1968.000000;1632.000000;1975.000000;1001.000000;1913.000000" "3275.000000;8378.000000;3175.000000;3392.000000;1804.000000;1933.000000" "32275.000000;4911.000000;3415.000000;3550.000000;4480.000000;3413.000000" ...
#>  $ transition_group_id     : chr  "10094_TTLHQAILMGR/3_run0" "10107_SVLEELK/2_run0" "10123_VIGVGGGGGNAINR/3_run0" "10185_FYDPGHVMLK/3_run0" ...
#>  $ decoy                   : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
#>  $ m_score                 : num  1.39e-03 1.37e-05 4.07e-04 5.76e-05 1.07e-07 ...
#>  $ RT                      : num  2764 2786 2151 3056 2160 ...
#>  $ align_origfilename      : chr  "/media/data/tmp/strep_align/Strep0_Repl1_R02/split_hroest_K120808_all_peakgroups.xls.gz" "/media/data/tmp/strep_align/Strep0_Repl1_R02/split_hroest_K120808_all_peakgroups.xls.gz" "/media/data/tmp/strep_align/Strep0_Repl1_R02/split_hroest_K120808_all_peakgroups.xls.gz" "/media/data/tmp/strep_align/Strep0_Repl1_R02/split_hroest_K120808_all_peakgroups.xls.gz" ...
#>  $ Intensity               : int  12015 19628 21957 52044 218929 6325 84020 75170 16479 138165 ...
#>  $ Sequence                : chr  "TTLHQAILMGR" "SVLEELK" "VIGVGGGGGNAINR" "FYDPGHVMLK" ...
#>  $ delta_rt                : num  21.68 4.835 18.691 -3.705 0.306 ...