Eric J Ma's Website

« 1 2 3 »

How LLMs can accelerate data science

written by Eric J. Ma on 2024-04-17 | tags: bioit world conference data science llms software development productivity tools ai training code completion debugging documentation commit messages

In this blog post, I share insights from my talk at the BioIT World conference in 2024, focusing on how LLMs empower data scientists and the necessity of software development skills in data science. I discuss practical applications of LLMs, such as code completion, documentation, debugging, and learning new domains, highlighting their role in enhancing productivity and efficiency. LLMs not only automate mundane tasks but also facilitate rapid knowledge acquisition, proving to be invaluable tools for data science teams. How could LLMs transform your data science work?

Read on... (2053 words, approximately 11 minutes reading time)
How to make distributable pre-commit hooks

written by Eric J. Ma on 2024-04-09 | tags: pre-commit webp optimization python

In this blog post, I share my journey of creating my first distributable pre-commit hook, convert-to-webp, using the pre-commit framework. This hook automatically converts images to the .webp format before they're committed to a repository, ensuring optimized image storage. I detail the essential configuration files, the creation of a Typer CLI for the hook, and how to make the hook available for others by tagging versions and adding it to a project's .pre-commit-config.yaml file. Curious about how to streamline your codebase with automated checks? How might this improve your project's efficiency?

Read on... (858 words, approximately 5 minutes reading time)
pyds-cli version 0.4.0 released!

written by Eric J. Ma on 2024-04-07 | tags: pyds-cli data science standards cookiecutter templates github actions

In this blog post, I share the latest updates to pyds-cli, including the use of cookiecutter templates for easy repo scaffolding and a new talks initializer for creating talk presentations using reveal-md. These updates simplify the CLI and offer a streamlined approach to project and talk setup, reflecting my commitment to promoting best practices among data scientists. With these tools, I aim to make it easier for data scientists to adopt standardized project structures. Curious about how these updates can enhance your workflow?

Read on... (832 words, approximately 5 minutes reading time)
How to grow software development skills in a data science team

written by Eric J. Ma on 2024-04-05 | tags: data science data science team software development upskilling tooling environment productivity

In this blog post, I share insights from my 7 years in the industry on how to enhance a data science team's software development skills, focusing on the necessity of tooling and practices that make it easy and normal to do the right thing: moving from notebook explorations to production-ready code. I also discuss the importance of community practices in fostering a culture of quality software development within data science teams. How can these strategies streamline your team's workflow and elevate their software development capabilities?

Read on... (2356 words, approximately 12 minutes reading time)
Llamabot 0.4.0 Released!

written by Eric J. Ma on 2024-03-24 | tags: llamabot querybot refactor chromadb lancedb vector database hybrid search chatui mixin panel llamabot repo chat litellm contributions open source

In this blog post, I share the latest updates of LlamaBot 0.4.0, highlighting the decoupling of document storage from text generation in QueryBot, the introduction of the ChatUIMixin for easy web UI integration, and the switch to LanceDB for its lightweight, SQLite-like handling of vectors. I also touch on enhancements to repo chat, making it simpler to launch web-based chatbots on repository content. If you're a llamabot user, I'd love to hear from you about how well it works for you!

Read on... (1150 words, approximately 6 minutes reading time)
How to organize and motivate a biotech data science team

written by Eric J. Ma on 2024-03-23 | tags: data science organization motivation research biotech team activities product-oriented service-oriented career development

In this blog post, I discuss about organizing and motivating a data science team within a biotech research setting, focusing on structuring team activities around key research entities and methodologies. I highlight the importance of aligning team members with projects that match their interests and professional goals, and suggest ways to foster leadership skills without formal management roles. How do we balance the technical and career aspirations of data scientists to maintain productivity and motivation?

Read on... (1237 words, approximately 7 minutes reading time)
Mixtral-8x7b-Instruct works on an old GTX1080!

written by Eric J. Ma on 2024-03-10 | tags: mixtral 8x7b-instruct old gpu linux tower 4-bit quantized llama bot keyword generator protein engineering machine learning older commodity hardware

In this blog post, I share my experience running the Mixtral 8x7b-Instruct model on my old Linux GPU tower. I used the 4-bit quantized model and was pleasantly surprised that it worked. I generated keywords for a paper on protein engineering and machine learning using the model, and the results were comparable to GPT-4. Although the model was slower than running mistral-7b, it was still functional on older hardware. Have you tried running large language models on older hardware? Read on to find out more about my experience.

Read on... (326 words, approximately 2 minutes reading time)
From Academia to Industry: Career Advice from MIT Industry Careers Panel

written by Eric J. Ma on 2024-03-09 | tags: career panel professional development phd advice networking job search public profile portfolio management work life balance

In this blog post, I share insights from a career panel at MIT where we discussed advice for Ph.D. students about to graduate. We covered the importance of studying job postings, networking effectively, maintaining a public profile, and understanding business needs when applying for jobs. We also touched on the value of a publicly viewable portfolio and the challenge of balancing work and home life. What other advice have you heard?

Read on... (1146 words, approximately 6 minutes reading time)
Your first 90 days at work - what should you do?

written by Eric J. Ma on 2024-02-29 | tags: career development productivity tips time management brag doc professional development first job

In this blog post, I share advice for those starting a new job, focusing on the first 90 days. I discuss the importance of automating your calendar, recording your accomplishments, building a company committee, and choosing a manageable project with tangible impact. These strategies can help you gain control over your career direction and make a positive impression in your new role. How can you apply these tips to your own career journey?

Read on... (1485 words, approximately 8 minutes reading time)
How to keep sharp with technical skills as a data science team lead

written by Eric J. Ma on 2024-02-25 | tags: data science leadership coaching mentorship continuous learning technical skills machine learning pair coding code review

In this blog post, I share my strategies for maintaining technical skills as a data science team lead. Balancing management duties with technical tasks, I use strategies like performing lower-level tasks, pair coding, code reviews, asking questions, and prototyping. These methods help me stay sharp and credible with my team, while also fostering personal growth. How do you keep your technical skills up-to-date?

Read on... (1007 words, approximately 6 minutes reading time)
« 1 2 3 »