Blog - Eric J. Ma's Personal Site

From Academia to Industry: Career Advice from MIT Industry Careers Panel

written by Eric J. Ma on 2024-03-09 | tags: career panel professional development phd advice networking job search public profile portfolio management work life balance

In this blog post, I share insights from a career panel at MIT where we discussed advice for Ph.D. students about to graduate. We covered the importance of studying job postings, networking effectively, maintaining a public profile, and understanding business needs when applying for jobs. We also touched on the value of a publicly viewable portfolio and the challenge of balancing work and home life. What other advice have you heard?

Read on... (1146 words, approximately 6 minutes reading time)

Your first 90 days at work - what should you do?

written by Eric J. Ma on 2024-02-29 | tags: career development productivity tips time management brag doc professional development first job

In this blog post, I share advice for those starting a new job, focusing on the first 90 days. I discuss the importance of automating your calendar, recording your accomplishments, building a company committee, and choosing a manageable project with tangible impact. These strategies can help you gain control over your career direction and make a positive impression in your new role. How can you apply these tips to your own career journey?

Read on... (1485 words, approximately 8 minutes reading time)

How to keep sharp with technical skills as a data science team lead

written by Eric J. Ma on 2024-02-25 | tags: data science leadership coaching mentorship continuous learning technical skills machine learning pair coding code review

In this blog post, I share my strategies for maintaining technical skills as a data science team lead. Balancing management duties with technical tasks, I use strategies like performing lower-level tasks, pair coding, code reviews, asking questions, and prototyping. These methods help me stay sharp and credible with my team, while also fostering personal growth. How do you keep your technical skills up-to-date?

Read on... (1007 words, approximately 6 minutes reading time)

LlamaBot with Ollama on my home virtual private network

written by Eric J. Ma on 2024-02-21 | tags: gpu deep learning ollama llm tailscale linux ubuntu gpu llamabot

In this blog post, I share how I breathed new life into my idle GPU tower by running an Ollama server on my home's private network. I connected all my devices via a Tailscale virtual private network and installed Ollama on my GPU server. I then used LlamaBot to build bots that utilized the Ollama server. This turned out to be an effective way to extend the usable life of my GPU box. Curious about how you can do the same with your idle GPU? Read on!

Read on... (1266 words, approximately 7 minutes reading time)

Dashboard-ready data is often machine learning-ready data

written by Eric J. Ma on 2024-02-18 | tags: data science machine learning data engineering python packages chemical screening predictive models data curation

In this blog post, I discuss the overlap between dashboard-ready and machine-learning-ready data. I share an example from a chemical screening campaign, where the same data used for a dashboard can also be used for machine learning models. I explore the reasons behind this from both a statistical and business perspective. How can you gain leverage in your data for both purposes?

Read on... (426 words, approximately 3 minutes reading time)

Success Factors for Data Science Teams in Biotech

written by Eric J. Ma on 2024-02-07 | tags: talks conferences slas2024 data science biotech

In this blog post, I shared my insights from the SLAS 2024 conference on how a data science team can deliver long-lasting impact in a biotech research setting. I discussed the importance of bounding work, identifying high-value use cases, possessing technical and interpersonal skills, and having the right surrounding context. I also shared some personal experiences and lessons learned from my work at Moderna. How can these insights help your data science team become successful agents of high-value delivery? Read on to find out!

Read on... (2476 words, approximately 13 minutes reading time)

An (incomplete and opinionated) survey of LLM tooling

written by Eric J. Ma on 2024-02-01 | tags: language model open source api switchboards python vector retrieval prompt experimentation ui builders command line interfaces large language models llms thought framework thought leadership

In this blog post, I explore the rapidly evolving landscape of large language model (LLM) tooling, discussing APIs, self-hosting, API switchboards, Python-based LLM Application SDKs, vector-based retrieval, prompt experimentation, evaluation, UI builders, and command-line interfaces. I share my experiences building LlamaBot and offer principles for making smart tech stack choices in this ever-changing field. How can you navigate this dynamic ecosystem and make the best decisions for your LLM projects? Read on to find out!

Read on... (1599 words, approximately 8 minutes reading time)

Exploratory data analysis isn’t open-ended

written by Eric J. Ma on 2024-01-28 | tags: data science eda exploratory data analysis pandas matplotlib seaborn correlations generative models therapeutics biological sequences metadata visualization

In this blog post, I challenge the traditional approach to exploratory data analysis (EDA) in data science. I argue that EDA should be directed and purposeful, not aimless. I share key principles for effective EDA, including falsifying our assumptions, having a clear end purpose, and embracing iteration when purposes are invalidated. I also emphasize the importance of practice and domain expertise in developing this skill. How can we make EDA more purposeful and effective in our data science work? Read on to find out!

Read on... (768 words, approximately 4 minutes reading time)

Your embedding model can be different from your text generation model

written by Eric J. Ma on 2024-01-15 | tags: embedding models retrieval augmented generation semantic search text generation vector databases llamabot documentstore sentence transformer

In this blog post, I debunked the misconception that embedding models must match the text generation model in retrieval augmented generation (RAG). I explained how these models are decoupled, with the choice of embedding affecting only the quality of content retrieved, not the text generation. I also shared my preference for SentenceTransformer due to its cost-effectiveness and performance. Finally, I updated LlamaBot to reflect this understanding, allowing for more flexible model composition. Curious about how this could change your approach to RAG? Read on!

Read on... (526 words, approximately 3 minutes reading time)

GitHub Actions secrets need to be explicitly declared

written by Eric J. Ma on 2024-01-11 | tags: llamabot mistral gpt-4 api key environment variables github actions repository secret workflow step

In this blog post, I share my experience of debugging GitHub Actions for LlamaBot. I encountered a challenge with setting the Mistral API key as an environment variable in my GitHub action. After hours of frustration, I discovered that GitHub Actions can only read a secret if it's explicitly included in a workflow. I explain how to include it in a workflow step. Curious about how to securely manage your API keys in GitHub Actions? Read on!

Read on... (212 words, approximately 2 minutes reading time)

Eric J Ma's Website