written by Eric J. Ma on 2016-03-13 | tags: python data science statistics R
Iāve heard this refrain many times. However, the distinction never really made sense to me.
R and Python are merely programming languages. You donāt have to do stats in R and data processing in Python. You can do data processing in R, and statistics in Python.
What is R? Itās a programming language designed by statisticians, and so thereās tons of one-liner functions to do stats easily.
What is Python? Itās a programming language thatās really well-designed for general purpose computing, so itās really expressive, and others can build tools on top of it.
What is data processing? I donāt think I can do justice to its definition here, but Iāll offer my own simple take: making data usable for other programming functions.
What is statistics? I think statistics, at its core, is really about describing/summarizing data, and figuring out how probable our data came from some model of randomness. Thatās all it is, and itās all about playing with numbers, really. Thereās nothing more than that. Technically, you can do statistics in any programming language, because technically, all programming languages deal with numbers...
Which brings me to the point I want to make - as long as you have data, and youāre doing data science, you technicallyĀ can use any language for it; the differences are not in the language itself, but in the ecosystem, ease-of-use, and other aspects.
OtherĀ bloggers have written about the benefits of using a single language, which include:
After all, as Wes McKinney wrote, the real problem isn't R vs. Python. It's the ability to move data seamlessly.
@article{
ericmjl-2016-r-for-statistics-python-for-data-processing,
author = {Eric J. Ma},
title = {R for Statistics, Python for Data Processing?},
year = {2016},
month = {03},
day = {13},
howpublished = {\url{https://ericmjl.github.io}},
journal = {Eric J. Ma's Blog},
url = {https://ericmjl.github.io/blog/2016/3/13/r-for-statistics-python-for-data-processing},
}
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!
Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!