This function renames protein ids in a data frame or file

convert_protein_ids(
  data_table,
  column_name = "Protein",
  species = "hsapiens_gene_ensembl",
  host = "www.ensembl.org",
  mart = "ENSEMBL_MART_ENSEMBL",
  ID1 = "uniprotswissprot",
  ID2 = "hgnc_symbol",
  id.separator = "/",
  copy_nonconverted = TRUE,
  verbose = FALSE
)

Arguments

data_table

A data frame or file name.

column_name

The column name where the original protein identifiers are present.

species

The species of the protein identifiers in the term used by biomaRt (e.g. "hsapiens_gene_ensembl", "mmusculus_gene_ensembl", "drerio_gene_ensembl", etc.)

host

Path of the biomaRt database (e.g. "www.ensembl.org", "dec2017.archive.ensembl.org").

mart

The type of mart (e.g. "ENSEMBL_MART_ENSEMBL", etc.)

ID1

The type of the original protein identifiers (e.g. "uniprotswissprot", "ensembl_peptide_id").

ID2

The type of the converted protein identifiers (e.g. "hgnc_symbol", "mgi_symbol", "external_gene_name").

id.separator

Separator between protein identifiers of shared peptides.

copy_nonconverted

Option defining if the identifiers that cannot be converted should be copied.

verbose

Option to write a file containing the version of the database used.

Value

The data frame with an added column of the converted protein identifiers.

Note

Protein identifiers from shared peptides should be separated by a forward slash. The host of archived ensembl databases can be introduced as well (e.g. "dec2017.archive.ensembl.org")

Author

Peter Blattmann

Examples

 if (FALSE) {
  data_table <- data.frame(
       "Protein" = c("Q01581", "P49327", "2/P63261/P60709"),
       "Abundance" = c(100, 3390, 43423))
  convert_protein_ids(data_table)
}