written by Eric J. Ma on 2024-05-05 | tags: data science biotech team management tutorial odsc east mission statement problem solving value delivery hiring challenges leadership
In this blog post, I share discussion insights from a hands-off tutorial I led at ODSC East on setting up a successful data science team within a biotech research organization. We explored formulating a mission, identifying problem classes, articulating value, and addressing challenges. I used my experience at Moderna to illustrate points, emphasizing the unique aspects of biotech data science. Despite not covering all topics due to time constraints, the discussion was enlightening, highlighting the contrast between biotech and other industries. How can these insights apply to your organization's data science team?
Read on... (2880 words, approximately 15 minutes reading time)written by Eric J. Ma on 2024-04-17 | tags: bioit world conference data science llms software development productivity tools ai training code completion debugging documentation commit messages
In this blog post, I share insights from my talk at the BioIT World conference in 2024, focusing on how LLMs empower data scientists and the necessity of software development skills in data science. I discuss practical applications of LLMs, such as code completion, documentation, debugging, and learning new domains, highlighting their role in enhancing productivity and efficiency. LLMs not only automate mundane tasks but also facilitate rapid knowledge acquisition, proving to be invaluable tools for data science teams. How could LLMs transform your data science work?
Read on... (2071 words, approximately 11 minutes reading time)written by Eric J. Ma on 2024-04-09 | tags: pre-commit webp optimization python
In this blog post, I share my journey of creating my first distributable pre-commit hook, convert-to-webp
, using the pre-commit framework. This hook automatically converts images to the .webp
format before they're committed to a repository, ensuring optimized image storage. I detail the essential configuration files, the creation of a Typer CLI for the hook, and how to make the hook available for others by tagging versions and adding it to a project's .pre-commit-config.yaml file. Curious about how to streamline your codebase with automated checks? How might this improve your project's efficiency?
written by Eric J. Ma on 2024-04-07 | tags: pyds-cli data science standards cookiecutter templates github actions
In this blog post, I share the latest updates to pyds-cli
, including the use of cookiecutter
templates for easy repo scaffolding and a new talks initializer for creating talk presentations using reveal-md
. These updates simplify the CLI and offer a streamlined approach to project and talk setup, reflecting my commitment to promoting best practices among data scientists. With these tools, I aim to make it easier for data scientists to adopt standardized project structures. Curious about how these updates can enhance your workflow?
written by Eric J. Ma on 2024-04-05 | tags: data science data science team software development upskilling tooling environment productivity
In this blog post, I share insights from my 7 years in the industry on how to enhance a data science team's software development skills, focusing on the necessity of tooling and practices that make it easy and normal to do the right thing: moving from notebook explorations to production-ready code. I also discuss the importance of community practices in fostering a culture of quality software development within data science teams. How can these strategies streamline your team's workflow and elevate their software development capabilities?
Read on... (2368 words, approximately 12 minutes reading time)written by Eric J. Ma on 2024-03-24 | tags: llamabot querybot refactor chromadb lancedb vector database hybrid search chatui mixin panel llamabot repo chat litellm contributions open source
In this blog post, I share the latest updates of LlamaBot 0.4.0, highlighting the decoupling of document storage from text generation in QueryBot, the introduction of the ChatUIMixin for easy web UI integration, and the switch to LanceDB for its lightweight, SQLite-like handling of vectors. I also touch on enhancements to repo chat, making it simpler to launch web-based chatbots on repository content. If you're a llamabot user, I'd love to hear from you about how well it works for you!
Read on... (1156 words, approximately 6 minutes reading time)written by Eric J. Ma on 2024-03-23 | tags: data science organization motivation research biotech team activities product-oriented service-oriented career development
In this blog post, I discuss about organizing and motivating a data science team within a biotech research setting, focusing on structuring team activities around key research entities and methodologies. I highlight the importance of aligning team members with projects that match their interests and professional goals, and suggest ways to foster leadership skills without formal management roles. How do we balance the technical and career aspirations of data scientists to maintain productivity and motivation?
Read on... (1240 words, approximately 7 minutes reading time)written by Eric J. Ma on 2024-03-10 | tags: mixtral 8x7b-instruct old gpu linux tower 4-bit quantized llama bot keyword generator protein engineering machine learning older commodity hardware
In this blog post, I share my experience running the Mixtral 8x7b-Instruct model on my old Linux GPU tower. I used the 4-bit quantized model and was pleasantly surprised that it worked. I generated keywords for a paper on protein engineering and machine learning using the model, and the results were comparable to GPT-4. Although the model was slower than running mistral-7b, it was still functional on older hardware. Have you tried running large language models on older hardware? Read on to find out more about my experience.
Read on... (326 words, approximately 2 minutes reading time)written by Eric J. Ma on 2024-03-09 | tags: career panel professional development phd advice networking job search public profile portfolio management work life balance
In this blog post, I share insights from a career panel at MIT where we discussed advice for Ph.D. students about to graduate. We covered the importance of studying job postings, networking effectively, maintaining a public profile, and understanding business needs when applying for jobs. We also touched on the value of a publicly viewable portfolio and the challenge of balancing work and home life. What other advice have you heard?
Read on... (1150 words, approximately 6 minutes reading time)written by Eric J. Ma on 2024-02-29 | tags: career development productivity tips time management brag doc professional development first job
In this blog post, I share advice for those starting a new job, focusing on the first 90 days. I discuss the importance of automating your calendar, recording your accomplishments, building a company committee, and choosing a manageable project with tangible impact. These strategies can help you gain control over your career direction and make a positive impression in your new role. How can you apply these tips to your own career journey?
Read on... (1490 words, approximately 8 minutes reading time)