Eric's Notes

Reading Bazaar

Interesting reads from my random walk over the internet

Enterprise Software Monetization is Fat-Tailed
Microsoft is allowing employees to go remote permanently
Notes from Work Rules on hiring
Limitations of Graph Neural Networks
Running Python on .NET 5
Why are tech companies making custom typefaces
Anatomy of a probabilistic programming framework
Will the M1 macs run the PyData stack
Electric bicycles that double up as cars
Avoiding technical debt with ML pipelines

Pages that link here

index
This is the landing page for my notes

Electric bicycles that double up as cars

This is a thing I'm looking at more and more closely.

ELF: built by Organic Transit, based in Durham, NC. Unfortunately, they are looking to be sold... not sure if they will continue to sell?

PEBL: built by Better Bikes. This one is based on Boston, MA. Their designs are better suited to the harsh winters of Boston. However, they are also more expensive.

Anatomy of a probabilistic programming framework

From George Foo: https://eigenfoo.xyz/prob-prog-frameworks/

Key ingredients of a probabilistic programming framework:

Language for specifying a model.
Library of probability distributions + facilities to specify arbitrary distributions.
Inference algorithm belonging to at least one of MCMC or VI.
An optimizer, to compute mode of posterior density.
Autodiff library, to compute gradients for items 3 and 4 (inference algo + optimizer)
Diagnostics suite to analyze quality of inference.

PyMC3 provides a whole lot of these, alongside ArviZ!

Will the M1 macs run the PyData stack

According to this post, the answer is yes!

GitHub user @mwidjaja put up some notes on his GitHub repo. In there, the key takeaways I saw are:

It's complicated.
Homebrew doesn't work, unless done with x86 emulation.
To enable Rosetta to do x86 emulation all the time in a terminal, we follow instructions here.
Install miniconda for macOS ARM.

The conda-forge team has done a lot to port over the entire conda stack to Apple Silicon. See more here.

As of 21 November 2020, the conclusion I have is that "no, it's complicated, don't try running Anaconda on M1 macs just yet".

Rather, for data science use cases, a remote server is a better investment.

Limitations of Graph Neural Networks

Article: https://towardsdatascience.com/limitations-of-graph-neural-networks-2412fffe677

Some thoughts below:

We need injective node aggregation functions in order to guarantee that graphs that are different will be differentiated from one another.

This probably means that a simple sum isn't good enough and a mean won't cut it.

Running Python on .NET 5

This one is by Anthony Shaw, and I found it first via Twitter, which took me to his blog post

Some things I learned in there.

Firstly CPython is slow in "tight loop" problems.

To be reviewed later, if I get the chance.

index

This is the landing page for my notes.

This is 100% inspired by Andy Matuschak's famous notes page. I'm not technically skilled enough to replicate the full "Andy Mode", though, so I just did some simple hacks. If you're curious how these notes compiled, check out the summary in How these notes are made into HTML pages.

This is my "notes garden". I tend to it on a daily basis, and it contains some of my less fully-formed thoughts. Nothing here is intended to be cited, as the link structure evolves over time. The notes are best viewed on a desktop/laptop computer, because of the use of hovers for previews.

There's no formal "navigation", or "search" for these pages. To go somewhere, click on any of the "high-level" notes below, and enjoy.

Notes on statistics
Notes on differential computing
The State of Data Science
Network science
Scholarly readings
Software skills for data scientists
The Data Science Programming Newsletter MOC
Life and computer hacks
Reading Bazaar
Blog drafts
Conference Proposals

Microsoft is allowing employees to go remote permanently

News article can be found here

Quotable:

The memo highlights the company’s plans to create a “hybrid workplace.” Microsoft said it will allow employees to work from home freely for less than 50% of their working week, but has said that managers will be able to approve permanent remote work if staff request it. Part-time working hours will also be available for employees with approval from their manager.

Why are tech companies making custom typefaces

Original URL: https://www.arun.is/blog/custom-typefaces/

The key economic argument for custom typefaces is licensing fees. Especially when one has to go global and have fonts for non-Latin scripts.

However, the author argues that there's more than just licensing and branding. Author claims that fonts have a functions and purpose, like furniture, using the following quote:

It is important to understand that a typeface is not a piece of art. it has a purpose, like a chair or an engine have. accordingly, before even putting pencil to paper, it is important to understand the requirements of what the typeface is trying to achieve. only once these requirements are known should actual design commence.
-- Bruno Maag

However, I don't see what the "functional purpose" of a typeface is, apart from some hints from the author, like being 'instantly recognizable' and the likes.

Tracing the argument further to Bruno Maag's interview:

designing a new typeface gives the client complete control over the look and feel, and probably more importantly, how this look and feel is translated across the diverse range of media we are operating in today.

a large part of development time is taken up by technical activities such as font engineering and hinting. this makes sure that the fonts work in a broad range of environments, across both printed and and digital applications.

by just reducing the character widths a few percent – without affecting legibility – it is possible to save paper, ink, print times etc.

Enterprise Software Monetization is Fat-Tailed

URL: https://whoisnnamdi.com/software-fat-tailed/

Take-aways:

Here, land and expand is effectively an indexing strategy — land at as many organizations with as little investment as possible. Every once in a while you'll land a Google, a Facebook, or an Amazon (both figuratively and literally) which will drive a disproportionate share of revenue.

Further, it can make sense to overspend somewhat on establishing those small beachheads, as they likely underestimate the true average contract value. For this reason, common metrics for evaluating the efficiency of software sales like the "magic number" may underestimate the efficiency of land and expand models, especially during the land phase.

The mathematical analysis inside here is pretty rad.

Notes from Work Rules on hiring

Sure, it can be fun to ask “What song best describes your work ethic?” or “What do you think about when you’re alone in your car?”—both real interview questions from other companies—but the point is to identify the best person for the job, not to indulge yourself by asking questions that trigger your biases (“OMG! I think about the same things in the car!”) and don’t have a proven link to getting the job done.

A common theme: it’s not the fancy-schmancy thing that brings us what we need. It’s the boring, mundane thing.

For example, the US Department of Veterans Affairs has a site with almost a hundred sample questions at www.va.gov/pbi/questions.asp.

And to be fair, we have moved from a philosophy of hiring exclusively generalists to a more refined approach, where we look across our portfolio of talent and ensure we have the right balance of generalists and experts. One of the luxuries of scale is that you can build areas of deep specialization, but even in those pockets we monitor to make sure there is always an influx of fresh, nonexpert thinking.

Ryan Tate of Wired wrote the best summary of it I’ve seen: Here is what [20 percent time] is not: A fully fleshed corporate program with its own written policy, detailed guidelines, and manager. No one gets a “20 percent time” packet at orientation, or pushed into distracting themselves with a side project. Twenty percent time has always operated on a somewhat ad hoc basis, providing an outlet for the company’s brightest, most restless, and most persistent employees—for people determined to see an idea through to completion…

Avoiding technical debt with ML pipelines

One of my readings from the web, taken from here.

By Hamza Tahir.

Emphasis on pipelines, end-to-end ownership, well-defined interfaces... sounds a ton like modern software development. (Which is why I think knowing modern software development workflow is a boon for data scientists.)