Eric J Ma's Website

« 1 2 3 4 »

Data Science in the Biotech Research Organization

written by Eric J. Ma on 2024-05-05 | tags: data science biotech team management tutorial odsc east mission statement problem solving value delivery hiring challenges leadership

In this blog post, I share discussion insights from a hands-off tutorial I led at ODSC East on setting up a successful data science team within a biotech research organization. We explored formulating a mission, identifying problem classes, articulating value, and addressing challenges. I used my experience at Moderna to illustrate points, emphasizing the unique aspects of biotech data science. Despite not covering all topics due to time constraints, the discussion was enlightening, highlighting the contrast between biotech and other industries. How can these insights apply to your organization's data science team?

Read on... (2871 words, approximately 15 minutes reading time)
How LLMs can accelerate data science

written by Eric J. Ma on 2024-04-17 | tags: bioit world conference data science llms software development productivity tools ai training code completion debugging documentation commit messages

In this blog post, I share insights from my talk at the BioIT World conference in 2024, focusing on how LLMs empower data scientists and the necessity of software development skills in data science. I discuss practical applications of LLMs, such as code completion, documentation, debugging, and learning new domains, highlighting their role in enhancing productivity and efficiency. LLMs not only automate mundane tasks but also facilitate rapid knowledge acquisition, proving to be invaluable tools for data science teams. How could LLMs transform your data science work?

Read on... (2053 words, approximately 11 minutes reading time)
How to make distributable pre-commit hooks

written by Eric J. Ma on 2024-04-09 | tags: pre-commit webp optimization python

In this blog post, I share my journey of creating my first distributable pre-commit hook, convert-to-webp, using the pre-commit framework. This hook automatically converts images to the .webp format before they're committed to a repository, ensuring optimized image storage. I detail the essential configuration files, the creation of a Typer CLI for the hook, and how to make the hook available for others by tagging versions and adding it to a project's .pre-commit-config.yaml file. Curious about how to streamline your codebase with automated checks? How might this improve your project's efficiency?

Read on... (858 words, approximately 5 minutes reading time)
pyds-cli version 0.4.0 released!

written by Eric J. Ma on 2024-04-07 | tags: pyds-cli data science standards cookiecutter templates github actions

In this blog post, I share the latest updates to pyds-cli, including the use of cookiecutter templates for easy repo scaffolding and a new talks initializer for creating talk presentations using reveal-md. These updates simplify the CLI and offer a streamlined approach to project and talk setup, reflecting my commitment to promoting best practices among data scientists. With these tools, I aim to make it easier for data scientists to adopt standardized project structures. Curious about how these updates can enhance your workflow?

Read on... (832 words, approximately 5 minutes reading time)
How to grow software development skills in a data science team

written by Eric J. Ma on 2024-04-05 | tags: data science data science team software development upskilling tooling environment productivity

In this blog post, I share insights from my 7 years in the industry on how to enhance a data science team's software development skills, focusing on the necessity of tooling and practices that make it easy and normal to do the right thing: moving from notebook explorations to production-ready code. I also discuss the importance of community practices in fostering a culture of quality software development within data science teams. How can these strategies streamline your team's workflow and elevate their software development capabilities?

Read on... (2356 words, approximately 12 minutes reading time)
Llamabot 0.4.0 Released!

written by Eric J. Ma on 2024-03-24 | tags: llamabot querybot refactor chromadb lancedb vector database hybrid search chatui mixin panel llamabot repo chat litellm contributions open source

In this blog post, I share the latest updates of LlamaBot 0.4.0, highlighting the decoupling of document storage from text generation in QueryBot, the introduction of the ChatUIMixin for easy web UI integration, and the switch to LanceDB for its lightweight, SQLite-like handling of vectors. I also touch on enhancements to repo chat, making it simpler to launch web-based chatbots on repository content. If you're a llamabot user, I'd love to hear from you about how well it works for you!

Read on... (1150 words, approximately 6 minutes reading time)
How to organize and motivate a biotech data science team

written by Eric J. Ma on 2024-03-23 | tags: data science organization motivation research biotech team activities product-oriented service-oriented career development

In this blog post, I discuss about organizing and motivating a data science team within a biotech research setting, focusing on structuring team activities around key research entities and methodologies. I highlight the importance of aligning team members with projects that match their interests and professional goals, and suggest ways to foster leadership skills without formal management roles. How do we balance the technical and career aspirations of data scientists to maintain productivity and motivation?

Read on... (1237 words, approximately 7 minutes reading time)
Mixtral-8x7b-Instruct works on an old GTX1080!

written by Eric J. Ma on 2024-03-10 | tags: mixtral 8x7b-instruct old gpu linux tower 4-bit quantized llama bot keyword generator protein engineering machine learning older commodity hardware

In this blog post, I share my experience running the Mixtral 8x7b-Instruct model on my old Linux GPU tower. I used the 4-bit quantized model and was pleasantly surprised that it worked. I generated keywords for a paper on protein engineering and machine learning using the model, and the results were comparable to GPT-4. Although the model was slower than running mistral-7b, it was still functional on older hardware. Have you tried running large language models on older hardware? Read on to find out more about my experience.

Read on... (326 words, approximately 2 minutes reading time)
From Academia to Industry: Career Advice from MIT Industry Careers Panel

written by Eric J. Ma on 2024-03-09 | tags: career panel professional development phd advice networking job search public profile portfolio management work life balance

In this blog post, I share insights from a career panel at MIT where we discussed advice for Ph.D. students about to graduate. We covered the importance of studying job postings, networking effectively, maintaining a public profile, and understanding business needs when applying for jobs. We also touched on the value of a publicly viewable portfolio and the challenge of balancing work and home life. What other advice have you heard?

Read on... (1146 words, approximately 6 minutes reading time)
Your first 90 days at work - what should you do?

written by Eric J. Ma on 2024-02-29 | tags: career development productivity tips time management brag doc professional development first job

In this blog post, I share advice for those starting a new job, focusing on the first 90 days. I discuss the importance of automating your calendar, recording your accomplishments, building a company committee, and choosing a manageable project with tangible impact. These strategies can help you gain control over your career direction and make a positive impression in your new role. How can you apply these tips to your own career journey?

Read on... (1485 words, approximately 8 minutes reading time)
« 1 2 3 4 »