Eric J Ma's Website

« 3 4 5 6 7 »

How to automatically write git commit messages

written by Eric J. Ma on 2023-09-23 | tags: commit messages conventional commits git workflow git llamabot python pre-commit software development data science

In this blog post, I discuss how I used LlamaBot, a Pythonic interface to Large Language Models (LLMs), to automatically write git commit messages following the Conventional Commits specification. By feeding the git diff into the LlamaBot SimpleBot, I was able to generate informative commit messages that make it easy to track project history and create accurate change logs. I also explain how to install the prepare-commit-msg hook to run the LlamaBot after pre-commit hooks and before editing the commit message. Interacting with LLMs requires precision and clarity in thinking to effectively utilize their capabilities.

Read on... (1320 words, approximately 7 minutes reading time)
Centaurs and Cyborgs: Interacting with Artificial Intelligence Tooling

written by Eric J. Ma on 2023-09-17 | tags: artificial intelligence centaurs cyborgs ai tooling integration biotech research ml models automation

In this blog post, I discuss the concept of Centaurs and Cyborgs in relation to how consultants interact with AI tooling. Centaurs have a clear division of labor between humans and AI, while Cyborgs deeply integrate the two. I explain how I personally work in Centaur mode for tasks like writing blog posts, delegating certain aspects to AI, and in Cyborg mode for modeling work. I also explore how this framework can be applied to integrating ML tooling into biotech research. Overall, the two modes are not mutually exclusive and can be further refined.

Read on... (518 words, approximately 3 minutes reading time)
How to extract query params from FastAPI

written by Eric J. Ma on 2023-09-17 | tags: til htmx fastapi web development programming frontend backend python requests

In this blog post, I learned how to extract query parameters from FastAPI. I discovered how to access key-value pairs from a GET request using the request.query_params dictionary. Additionally, I found a solution to properly format URLs by using the urllib.parse submodule. This information was crucial in developing a blog writing assistant with a frontend in HTMX and a backend in FastAPI. Overall, it was a valuable learning experience that I hope will be useful to others as well.

Read on... (370 words, approximately 2 minutes reading time)
Article Review: 4 Skills the Next Generation of Data Scientists Needs to Develop

written by Eric J. Ma on 2023-09-09 | tags: data science biotech research machine learning problem spotting problem scoping problem shepherding solution translating laboratory science protein engineering antibody therapies

In this blog post, I reflect on the importance of four key skills for data scientists in the biotech field: problem spotting, problem scoping, problem shepherding, and solution translating. These came from an article in the Harvard Business Review. I show by example the need to understand the real issues faced by our collaborators, ask probing questions, maintain regular communication, and speak the language of the audience -- the last one being crucial. These skills are crucial in building trust, understanding the underlying science, and developing effective solutions.

Read on... (958 words, approximately 5 minutes reading time)
Interviewing Data Science Candidates with Code Reviews

written by Eric J. Ma on 2023-09-06 | tags: data science hiring interviewing code review coding skills candidate assessment documentation design choices machine learning

In this blog post, I discuss a different approach to evaluating a candidate's coding skills during an interview. I ask them to bring a piece of code they're proud of and conduct a code-review style discussion. This method reveals their standards of excellence, their ability to explain and defend their work, and their thought process behind their design choices. I also share a rubric for assessing coding skills, which includes factors like code organization, documentation, and the candidate's response to feedback.

Read on... (692 words, approximately 4 minutes reading time)
Promotions vs. Bonuses

written by Eric J. Ma on 2023-09-04 | tags: career growth promotion bonus peter principle work rewards motivation morale professional development incentives

In this blog post, I debunk the misconception that promotions are rewards for excellent work at your current level. Instead, I share a framework I learned before that promotions should be rewards for demonstrating sustained excellence at a higher level. Bonuses, on the other hand, are the appropriate reward for outstanding work at your current level. But are promotions and bonuses enough as a motivator?

Read on... (710 words, approximately 4 minutes reading time)
What's the difference between `setup.cfg`, `pyproject.toml`, and `setup.py`?

written by Eric J. Ma on 2023-08-31 | tags: packaging setup.py setup.cfg python pyproject.toml enhancement proposal project configuration dependencies package management conda project structure

In this blog post, I explored the differences between setup.cfg, pyproject.toml, and setup.py in Python packaging. I explained their historical context and usage, and recommended using pyproject.toml as the setup configuration file for Python packages in 2023. I also discussed the importance of Python packaging for data scientists, and the distinction between environment.yml and pyproject.toml. The former defines a project's development environment, while the latter provides pip with installation and usage information for a Python package that I might be working on.

Read on... (750 words, approximately 4 minutes reading time)
Research code benefits from unit testing

written by Eric J. Ma on 2023-08-30 | tags: code review unit tests research chemistry ml chemistry data splitting property prediction software testing code correctness

In this blog post, I discuss the importance of unit tests in research code. I share an experience from a code review with our intern, Matthieu, where we realized the need for rigorous testing of a non-standard splitting strategy in our ML model. We concluded that even research code, which might be discarded eventually, can benefit from thorough testing to ensure its correctness. This is particularly crucial when the code is used for comparisons or collaborations.

Read on... (473 words, approximately 3 minutes reading time)
Service vs. Product-Oriented Data Science

written by Eric J. Ma on 2023-08-28 | tags: automation biotech collaboration data insights data product data science model building predictive models product oriented protein engineering service oriented software engineering team collaboration tool building

In this blog post, I explore the two flavours of data science work: service-oriented and product-oriented. Service-oriented data science serves others in a one-off fashion, while product-oriented data science builds a reusable tool for a well-defined problem. Both have their value depending on the situation. I discuss the challenges in navigating between the two and emphasize the importance of adopting a product-first orientation. As an individual contributor or team lead, it's crucial to shift from being mere consumers of tooling to makers of tools, enhancing efficiency and scalability!

Read on... (935 words, approximately 5 minutes reading time)
Deploy to Dokku from GitHub Actions

written by Eric J. Ma on 2023-08-27 | tags: til github actions cicd continuous integration continuous delivery dokku digitalocean deployment coding devops cost efficiency dokku server git dokku deployment

In my latest blog post, I share my experience of hosting a Dokku server on DigitalOcean and how I've managed to automate the deployment process using GitHub Actions. I delve into the cost benefits of using Dokku on DigitalOcean over other services like Heroku and Fly.io. I also provide a step-by-step guide on how to configure GitHub Actions to deploy apps to DigitalOcean automatically. If you're interested in saving time and money on app deployment, this post is a must-read.

Read on... (378 words, approximately 2 minutes reading time)
« 3 4 5 6 7 »