Eric J Ma's Website

« 2 3 4 5 6 »

How I made a local pre-commit hook to resize images

written by Eric J. Ma on 2023-10-14 | tags: pre-commit pre-commit hook automation python python script software development data science dalle-3 til

In this blog post, I share my experience creating a custom pre-commit hook for resizing images within a repository. This hook automates the process of ensuring all logos meet a defined width, saving time and maintaining consistency. It uses Python and integrates with the pre-commit framework, running in an isolated environment to keep the main project clean. I also discuss the potential of distributing these hooks for wider use. Curious about how you can automate checks and streamline your development process with pre-commit hooks?

Read on... (667 words, approximately 4 minutes reading time)
When to write tests for your data science code

written by Eric J. Ma on 2023-10-10 | tags: datascience testing machine learning best practices production research exploratory analysis

In this blog post, I discuss the importance of testing in data science code. I explain how research code can transition into production code and the potential implications of errors or oversights. I suggest three levels of testing: adding assertions within your notebook, migrating code into functions and testing them, and refactoring code into a library with associated test functions. By testing and refactoring our code, we can ensure its accuracy and reliability. Are you curious to see how you can test your code as a data scientist?

Read on... (842 words, approximately 5 minutes reading time)
Check docstrings blazing fast with pydoclint

written by Eric J. Ma on 2023-10-09 | tags: coding documentation darglint docstrings tools technologies pydoclint pyjanitor continuous integration til

In this blog post, I discuss the importance of documenting code and the risks of using outdated tools like darglint. I introduce pydoclint as a faster alternative and share a case study of how it solved a problem for the pyjanitor project. I provide instructions on getting started with pydoclint and highlight its default configurations. As a data scientist and tool developer, I'm always on the lookout for better tools, and pydoclint promises a smoother experience. Are you ready to embrace the future with pydoclint?

Read on... (394 words, approximately 2 minutes reading time)
It's time to upgrade to Ruff

written by Eric J. Ma on 2023-10-09 | tags: python ruff tips and tricks rust pre-commit

In this blog post, I discuss the benefits of using Ruff, a blazing fast linter for Python code. With its speed and performance, Ruff can significantly reduce linting and code style checking times. It is written in Rust, known for its performance and safety features. I provide step-by-step instructions on how to integrate Ruff into your workflow, including installing the pre-commit hook and configuring Ruff in pyproject.toml. If you're looking to improve the quality and efficiency of your Python codebase, give Ruff a try. Are you ready to switch to Ruff and experience lightning-fast code checking?

Read on... (349 words, approximately 2 minutes reading time)
VSCode Tip: Cmd+P lets you switch to any file within a repository

written by Eric J. Ma on 2023-10-08 | tags: vscode tips and tricks til navigation repository productivity

In this blog post, I share a quick tip for using VSCode. I show how to easily locate and open any file within a repository using the Command Palette. By typing keywords in the file-browsing mode, you can quickly narrow down the exact file you want to open. Have you ever struggled to find a specific file in VSCode? Read on to discover this time-saving trick!

Read on... (109 words, approximately 1 minute reading time)
How to choose a (conda) distribution of Python

written by Eric J. Ma on 2023-10-07 | tags: conda anaconda miniforge python distribution data science pip tooling python

In this blog post, I discuss the differences between the Anaconda, Miniconda, and Miniforge distributions of Python. Anaconda is the official distribution from Anaconda and comes with a wide range of data science packages. Miniconda is a smaller version of Anaconda, intended for use in Docker containers. Miniforge, developed by the conda-forge team, pulls packages from the conda-forge repository and includes mamba. The choice of distribution depends on your needs and preferences, with Miniforge being recommended for lightweight and open-source use, and Anaconda for enterprise support and backing the Python open source world.

Read on... (790 words, approximately 4 minutes reading time)
How to use Python functions as a template engine for prompts

written by Eric J. Ma on 2023-10-06 | tags: python llm gpt-4 coding outlines llamabot jinja2 prompt management chatbots

In this blog post, I explore the use of Outlines for prompt management in Python, specifically for LlamaBot. However, due to its heavy dependencies, I decided to reimplement the functionality using GPT-4 as a coding aid. The result was a successful reimplementation that allowed me to organize prompts within .py source modules more easily. A lesson from this experience. is the importance of clarity in programming even when we're using LLMs to help us code.

Read on... (913 words, approximately 5 minutes reading time)
Shape Up and Data Science: A Match Closer to Agile Than You Think

written by Eric J. Ma on 2023-10-05 | tags: data science agile scrum shape up software methodologies product development deep work team autonomy adaptability

In this blog post, I explore the limitations of Scrum for data science. I introduce Shape Up as a potential alternative. I discussed how Shape Up's ways of working align better with the unique needs of data science, such as deep domain specialization and varied feedback durations. I also highlighted how Shape Up embodies Agile's core values while suggesting modifications to suit data science projects better. Ultimately, I emphasized the importance of adaptability and delivering value, staying true to Agile's core principles.

Read on... (1940 words, approximately 10 minutes reading time)
How automating git workflows improves data scientists

written by Eric J. Ma on 2023-09-30 | tags: automation git commit messages release notes data workflow data science jupyter notebook lab notebook

In this blog post, I discuss the importance of commit messages for data scientists and how automated commit message writers can improve their workflows. I highlight the psychological barrier of committing in-progress work and the benefits of having informative commit logs. By using automatic commit message generation, data scientists can create a digital lab notebook that summarizes their work and aids in resuming tasks. This blog post emphasizes the value of good commit logs in maximizing productivity for data scientists.

Read on... (474 words, approximately 3 minutes reading time)
How to crisp up your resume with ChatGPT

written by Eric J. Ma on 2023-09-26 | tags: resume career development gpt large language models chatgpt

In this blog post, I share my discovery of using ChatGPT and GPT4 to enhance a PhD student's resume. By utilizing the prompt and interactive process, you can efficiently condense bullet points without losing important information. I explain how the AI model suggests rephrasing and offer tips on how to further shorten bullet points.

Read on... (118 words, approximately 1 minute reading time)
« 2 3 4 5 6 »