My brain works better when I get it all out.
I'm doing research this semester on machine learning and fairness. But before I can really dig into that, I need a firmer grasp of machine learning terminology. So I've been working my way through a paper called "A Few Useful Things to Know about Machine Learning" by Pedro Domingos.
Sounds sweet and gentle, right? It's not (for little ol' me at least).
Part of the problem is that I'm a bit of an island on this one. This research is for an independent study that covers a lot more than the details of machine learning algorithms. It's about the process of regulating these tools, and involves public policy, law, ethical frameworks, corporations, nongovernmental regulatory bodies, individual psychology, education, and more.
That means I'm making the most out of my interdisciplinary program. But it also means that I have to save up my big computer science questions and try to find a nice computer scientist to talk them out with me, because I often learn best when I can ask questions and interrogate responses. Until then, I'm left to my own devices. And that often involves writing out ideas to try to get them straight in my head.
Last night, I tackled this sentence by Googling things, searching the trusty Artificial Intelligence: A Modern Approach, and asking questions of a kind software engineer:
"A linear learner has high bias, because when the frontier between two classes is not a hyperplane the learner is unable to induce it."
This is describing an area in which data scientists need to be particularly careful when designing machine learning algorithms. Here's my translation:
"A computer model that is designed to find a way to explain patterns in example data by drawing a clear-cut, straight boundary between Things of Type 1 and Things of Type 2 can be way off in some cases. For one, the patterns in the data might be messier than can be represented by a straight boundary. It might be impossible to clearly say, 'Hey, everyone on that side of the line is this thing, and every on this side of the line is that thing.' Things that fall into the same category may be more mixed together.
Applying a computer model that tries to find a straight boundary to that kind of mixed-up data will give you some sort of model that works. It will find some sort of boundary. But it won't draw the right conclusions from the examples it is fed, and it won't tell you about the way the real world works. This is one reason it is important to make sure you understand your data and are looking for the right kinds of patterns."
And here's how I got to that... (If you're a machine learning person and happen to be reading this, please let me know if I got anything wrong so I can learn!)