Using biomaRt in Python - a quick rpy2 tutorial
Recently, as part of my analyses of a transcriptome, I wanted to retrieve the corresponding gene IDs and biotypes for a list of Drosophila transcript IDs that I have, using the R package
biomaRt. As I did the rest of my data wrangling in Python, I wanted to run everything in Spyder, rather than having to swtich to RStudio just for this.
I had been able to use both Python and R in the same Jupyter Notebook using either the polyglot SoS Notebook or rmagic, but the Jupyter Notebook structure is just not suited for first-round analysis (more on that in a later post). I had also tried Python interfaces to Biomart, such as pybiomart and biomartpy, but neither of them were as easy to use as the original
biomaRt R package.
So, I decided to get Python-R interface
rpy2 to work in my Python script in Spyder. It took some tinkering, but after some Googling, I got something cooking.
1. Import libraries
from rpy2.robjects.vectors import StrVector from rpy2.robjects import pandas2ri from rpy2.robjects import r as R pandas2ri.activate()
2. Import Pandas dataframe containing the Drosophila transcript IDs into R
r_DM_tIDs = pandas2ri.py2ri(DM_tIDs)
3. Biomart analysis
R.library("biomaRt") mart = R.useMart(biomart="ensembl", dataset="dmelanogaster_gene_ensembl", host="www.ensembl.org") DM_BM = R.getBM(attributes = StrVector(("flybase_transcript_id", "ensembl_gene_id", "external_gene_name", "transcript_biotype", "gene_biotype")), filters = "flybase_transcript_id", values = r_DM_tIDs, mart = mart)
Note: It was super helpful to figure out that the
StrVector object was needed to convert the list of attributes into a string vector usable in R, thanks to this StackOverflow post.
4. Export result back into a Pandas dataframe
DM_BM_py = pandas2ri.ri2py(DM_BM)
This is a super simple example of using
rpy2 to integrate Python and R, but one that I have not seen posted anywhere. Since Biomart is an important part of many bioinformatics workflow, I think it could be useful to see this worked out in one fashion.
Here are two more detailed in-depth tutorials of using
rpy2 (not specific to bioinformatics), which I plan on studying up: