Python or R? Not a Question Anymore

3 minute read

Published:

Hong Kong shares a land border with Mainland China, featuring nine major land-based control points. Most drivers in the two sides, however, do not attempt to cross the border, partly due to insurance complexity over the two jurisdictions. There is a much more important reason: Hong Kong uses left-hand traffic, following U.K., while Mainland uses right-hand traffic, as in most continental countries. Faint-hearted drivers can easily get confused over the other side. Almost 30 years after Hong Kong’s return, this misalignment remains and no one attempted any change. A solution is foreseeable, though. Once driving becomes fully autonomous, which side the steer wheel is mounted is irrelevant. AI does not sit on the left or right.

Every few years, I write a post about the battle of computing languages. I have been a 20-year R user, but I decided to convert to Python in 2025 when I revamped my Data Science course. The transition was smooth.

Computing languages are just languages, and Large Language Models excel in languages. That explains why AI capacity demonstrates its strongest performance in coding tasks, now in the agentic mode. With LLMs at our aides, the difference between computing languages blurs. A few days ago, I tested an AI agent translating one of my full-functioning R packages into a Python replica. In 20 minutes, the job was done, and the agent automatically checked that the two packages produced the same numerical results. The efficiency and accuracy were stunning.

I am still a human. As a native R speaker, honestly, I do not utter Python at the same level of proficiency as R. (As an old-style R speaker, I sometimes feel uncomfortable with the tidyverse dialect of R.) Recently, I asked AI to code a demonstration of some new method that I wanted to learn. With too many self-defined classes, I had a hard time to understand the Python version. I then deleted the Python notebook, restarted a new AI conversation, and requested the agent to provide an R version. It turned out that the R version was much easier (for me) to read and comprehend. With some tests and experiments, it helped me grasp the new method well.

It is a fact that R, as a language developed by and for statisticians, does not ride the wave of AI and machine learning tasks such as language processing, image processing, and so on. The most popular framework for deep learning is pytorch, and it is Python-native. A general programming language, Python is capable in various computer science tasks beyond scientific computing.

I predicted in early 2025 that R would be dominated by Python even in data science. Things change in one year. With the rapid advances of vibe coding, people use natural language for their tasks. Though natural language must be translated by AI into a programming language, it is no longer crucial if the programming language is R or Python. That choice should depend on the following considerations:

  • If the human will read the code, what language is the human most comfortable with?
  • The upper-stream and down-stream workflow. For deep neural networks, Python is a better choice; for day-to-day data analysis, R suffices and is more convenient.

Even though I still prefer to teach students in Python, now I do not feel guilty or ashamed to summon my beloved R for my own jobs.

What about MATLAB or STATA? These brothers have no future.