Blog - Eric J. Ma's Personal Site

How to grow software development skills in a data science team

written by Eric J. Ma on 2024-04-05 | tags: data science data science team software development upskilling tooling environment productivity

In this blog post, I share insights from my 7 years in the industry on how to enhance a data science team's software development skills, focusing on the necessity of tooling and practices that make it easy and normal to do the right thing: moving from notebook explorations to production-ready code. I also discuss the importance of community practices in fostering a culture of quality software development within data science teams. How can these strategies streamline your team's workflow and elevate their software development capabilities?

Read on... (2368 words, approximately 12 minutes reading time)

Llamabot 0.4.0 Released!

written by Eric J. Ma on 2024-03-24 | tags: llamabot querybot refactor chromadb lancedb vector database hybrid search chatui mixin panel llamabot repo chat litellm contributions open source

In this blog post, I share the latest updates of LlamaBot 0.4.0, highlighting the decoupling of document storage from text generation in QueryBot, the introduction of the ChatUIMixin for easy web UI integration, and the switch to LanceDB for its lightweight, SQLite-like handling of vectors. I also touch on enhancements to repo chat, making it simpler to launch web-based chatbots on repository content. If you're a llamabot user, I'd love to hear from you about how well it works for you!

Read on... (1156 words, approximately 6 minutes reading time)

How to organize and motivate a biotech data science team

written by Eric J. Ma on 2024-03-23 | tags: data science organization motivation research biotech team activities product-oriented service-oriented career development

In this blog post, I discuss about organizing and motivating a data science team within a biotech research setting, focusing on structuring team activities around key research entities and methodologies. I highlight the importance of aligning team members with projects that match their interests and professional goals, and suggest ways to foster leadership skills without formal management roles. How do we balance the technical and career aspirations of data scientists to maintain productivity and motivation?

Read on... (1240 words, approximately 7 minutes reading time)

Mixtral-8x7b-Instruct works on an old GTX1080!

written by Eric J. Ma on 2024-03-10 | tags: mixtral 8x7b-instruct old gpu linux tower 4-bit quantized llama bot keyword generator protein engineering machine learning older commodity hardware

In this blog post, I share my experience running the Mixtral 8x7b-Instruct model on my old Linux GPU tower. I used the 4-bit quantized model and was pleasantly surprised that it worked. I generated keywords for a paper on protein engineering and machine learning using the model, and the results were comparable to GPT-4. Although the model was slower than running mistral-7b, it was still functional on older hardware. Have you tried running large language models on older hardware? Read on to find out more about my experience.

Read on... (326 words, approximately 2 minutes reading time)

From Academia to Industry: Career Advice from MIT Industry Careers Panel

written by Eric J. Ma on 2024-03-09 | tags: career panel professional development phd advice networking job search public profile portfolio management work life balance

In this blog post, I share insights from a career panel at MIT where we discussed advice for Ph.D. students about to graduate. We covered the importance of studying job postings, networking effectively, maintaining a public profile, and understanding business needs when applying for jobs. We also touched on the value of a publicly viewable portfolio and the challenge of balancing work and home life. What other advice have you heard?

Read on... (1150 words, approximately 6 minutes reading time)

Your first 90 days at work - what should you do?

written by Eric J. Ma on 2024-02-29 | tags: career development productivity tips time management brag doc professional development first job

In this blog post, I share advice for those starting a new job, focusing on the first 90 days. I discuss the importance of automating your calendar, recording your accomplishments, building a company committee, and choosing a manageable project with tangible impact. These strategies can help you gain control over your career direction and make a positive impression in your new role. How can you apply these tips to your own career journey?

Read on... (1490 words, approximately 8 minutes reading time)

How to keep sharp with technical skills as a data science team lead

written by Eric J. Ma on 2024-02-25 | tags: data science leadership coaching mentorship continuous learning technical skills machine learning pair coding code review

In this blog post, I share my strategies for maintaining technical skills as a data science team lead. Balancing management duties with technical tasks, I use strategies like performing lower-level tasks, pair coding, code reviews, asking questions, and prototyping. These methods help me stay sharp and credible with my team, while also fostering personal growth. How do you keep your technical skills up-to-date?

Read on... (1013 words, approximately 6 minutes reading time)

LlamaBot with Ollama on my home virtual private network

written by Eric J. Ma on 2024-02-21 | tags: gpu deep learning ollama llm tailscale linux ubuntu gpu llamabot

In this blog post, I share how I breathed new life into my idle GPU tower by running an Ollama server on my home's private network. I connected all my devices via a Tailscale virtual private network and installed Ollama on my GPU server. I then used LlamaBot to build bots that utilized the Ollama server. This turned out to be an effective way to extend the usable life of my GPU box. Curious about how you can do the same with your idle GPU? Read on!

Read on... (1276 words, approximately 7 minutes reading time)

Dashboard-ready data is often machine learning-ready data

written by Eric J. Ma on 2024-02-18 | tags: data science machine learning data engineering python packages chemical screening predictive models data curation

In this blog post, I discuss the overlap between dashboard-ready and machine-learning-ready data. I share an example from a chemical screening campaign, where the same data used for a dashboard can also be used for machine learning models. I explore the reasons behind this from both a statistical and business perspective. How can you gain leverage in your data for both purposes?

Read on... (426 words, approximately 3 minutes reading time)

Success Factors for Data Science Teams in Biotech

written by Eric J. Ma on 2024-02-07 | tags: talks conferences slas2024 data science biotech

In this blog post, I shared my insights from the SLAS 2024 conference on how a data science team can deliver long-lasting impact in a biotech research setting. I discussed the importance of bounding work, identifying high-value use cases, possessing technical and interpersonal skills, and having the right surrounding context. I also shared some personal experiences and lessons learned from my work at Moderna. How can these insights help your data science team become successful agents of high-value delivery? Read on to find out!

Read on... (2490 words, approximately 13 minutes reading time)

Eric J Ma's Website