Machine Learning Guide to Start-Up (on Netflix)

Siyang Sun
9 min readDec 8, 2020

--

The pure excitement of seeing that your code works (from Start-Up episode 1)

I’ve not delved too deeply into the world of Korean dramas. However, as someone who works on machine learning at an early-stage startup, I was intrigued by the new show Start-Up for its premise and setting. Though unrealistic at times, the show was able to capture a lot of the serendipity and magic as well as the pressure of building something from scratch. I haven’t watched a show that romanticized tech and startups since Silicon Valley and I’ve definitely never seen a show that cast a focus on machine learning. Despite the long episode length (nearly an hour and a half!), I somehow binged it within a few days just as the season was ending.

While watching, I found the show’s onscreen notes were a good start at defining the tech/investment jargon, but I thought there’d be value in a more in-depth explanation of the machine learning terms. Henceforth, this article.

This obviously contains spoilers for Season 1 of Start-Up on Netflix.

Quick note on code

A juicier piece of code containing a neural network’s inner workings. I was gonna complain about defining his own sigmoid function instead of simply importing from sklearn but apparently there are marginal performance improvements from doing this… (from Start-Up episode 5)

Most of the code shown on screen in the show is of basic Python import functions or setting up the class structure for the model. These are definitely real sections of code that ML engineers write, but are just the very start of any ML/AI project, and certainly not the most difficult or interesting code you would be writing. For example, at one point Nam Do-san imports a decision tree package, which is a genuine type of machine learning model used by data scientists; however, for the task at hand of image recognition, it’s a fairly simple model most people would only use as a backup or baseline. Another piece of an ML project we don’t get to see or hear about is the cleaning and preparing of datasets. We can ignore this fact if we assume the data given to them is already pretty clean, but in my experience this is rarely the case.

Episode 3

Nam Do-san convinces his colleagues that Han Ji-pyeong would be the best business partner because they share the same objective function. In optimization, the objective function is the equation that you want to minimize or maximize by changing the values of the inputs. For example, for the objective function f(x,y,z), we may want to find the combination of values of x, y, and zthat give us the highest value. Objective functions are often used as a reward/punishment mechanism in machine learning. Nam Do-san is saying that Samsan tech and Han Ji-pyeong share the same goal.

Architecture diagram of a GAN. Samsan Tech is the Generator (from Start-Up episode 3)

During this conversation, Nam Do-san explains on the whiteboard with a figure of a neural network. The type of network displayed is a Generative Adversarial Network, or GAN, often used in image generation and processing. This network consists of two main parts: a generator, that tries to create artificial data (in most cases, images), and a discriminator that tries to discern if the data it sees is genuine. These opposing networks compete with each other and learn from each other in the GAN. Do-san labels Samsan Tech as the “Generator” side, while Han Ji-pyeong is the “Discriminator” side; this mirrors reality, as the startup are the ones generating their company while Ji-pyeong is evaluating its worth.

Do-san also brings up a concept of Nash Equilibrium, which is a game theory concept commonly exemplified with the Prisoner’s Dilemma (cue traumatic flashbacks for anyone who’s taken Econ 101). In a game with two or more players, a Nash Equilibrium is a situation when no player has anything to gain by changing their own strategy. Interestingly enough, recent research suggests that a GAN might not even have a Nash Equilibrium.

Episode 5

In a flashback, Do-san uses an analogy of Jane and Tarzan to explain machine learning to Dal-mi. Tarzan (the ML model) literally knows nothing about the world outside the jungle and can only adapt based on Jane’s response (the objective function). After each iteration, Tarzan learns a little bit more about Jane, just like an ML model learns little by little over time.

Shot of a model’s error converging (from Start-Up episode 5)

On screen we see several clips of a graph moving and dropping off. In machine learning, this is a graph of model convergence; essentially, the model trains over time and tries to reduce the error, until it starts to flatten out. If you remember from before, this error is the objective function that machine learning models try to minimize. Some of these graphs have one colored line labeled train and another labeled valid or test. Data scientists will often split a dataset into a training set, which is used to train the model, and a validation or test set to check performance on “unseen” data. This is how we can see if a model is overfitting, which is when it over-optimizes on the data it has seen and suffers in new situations.

Samsan Tech’s team runs into this issue when they notice their test accuracy is low (the accuracy on the test dataset). Dal-mi’s continuation of the Tarzan analogy (“I like rocks…”) is what sparks an idea for a solution.

A simple neural network (source: Towards Data Science)

Neural networks are made up of layers of neurons, which we can reduce in order to simplify the network. During the hackathon, Do-san realized they need to slim down the network they had been using for their previous tech. Handwriting is image data, which has less complexity than video data. Using an overly complex network for the given dataset can lead to overfitting. For this reason, reducing the size/complexity of the neural network can lead to better results.

Samsan Tech’s project, a handwriting fraud discriminator, ends up going head to head on stage with the other AI team, who has built a font generator. This is a real-time depiction of one training period of a GAN.

Side note: In a real hackathon / startup accelerator you would probably never unexpectedly go head to head on stage like this. The amount of logistics to even set up both teams’ hardware would be unreasonable for an event with a predetermined schedule.

Episode 6

Denoising audio (source: MathWorks)

After a fight over the cap table, Do-san assures Dal-mi that what the other guys said was “just noise”. In signals processing, which is used for ML, noise is the irrelevant information in a dataset. ML models are built to discern the signal (what you want the model to learn) from the data while ignoring the noise. A real world example is trying to tune out background noise in a recording of someone speaking while trying to figure out what they said.

Episode 7

Nam Do-san’s code for his new idea (from Start-Up episode 7)

In Python, you can add comments that won’t effect the code with a # symbol. Given this information, it’s unclear why Nam Do-san is writing comments in English. We don’t see him copy the code from Stack Overflow (a commonly used code troubleshooting website), so he must have written these comments himself. I don’t know much about AI developers in Korea, but I’m assuming they probably don’t write their code comments in perfect English.

Later in the episode, Do-san and Dal-mi are infuriated that they have been tasked with only doing data collection in their contract with Morning Group. Data collection and labeling is one of the most mundane parts of an AI project. Researchers and developers often either work with datasets that are already completed or pay other parties to collect data, usually at a low rate.

Episode 10

Examples of image data augmentation techniques (source: https://github.com/tgilewicz/aug)

In an aside conversation in their office, the team discusses design decisions for their product. Jung Sa-ha mentions that light reflection can cause false negatives and false positives, which are a challenge for a binary classification problem (predict either true or false). A false negative occurs when the model predicts False when the reality is True, and a false positive is when the model predicts True when the reality is False. Based on later parts of the episode, we can infer that they are talking about the face detection component of their project. Kim Yong-san suggests applying an LBP (Local Binary Pattern) pre-processing step, a texture analysis technique sometimes used in facial recognition. Nam Do-san retorts that they should instead try an R-CNN (Region-Based Convolutional Neural Network), a sophisticated type of neural network used in computer vision and object detection. Lee Chul-san brings up also using data augmentation via optical transformation. This technique involves creating artificial data by applying minor warping on your existing data. Increasing the number of examples can help the model’s training.

A random clinical trail (source: Boston University MPH)

When convincing Dal-mi not to listen to the woman telling her to hold a ceremony for her ancestors, Do-san brings up a few technical concepts to refute her evidence. He argues that because she didn’t set up experimental groups and a randomized clinical trial, her alleviation of neck strain from holding such a ceremony cannot be used as concrete evidence. In science and data science, experimental groups and control groups are used to determine the effect of a treatment. The experimental groups receive the treatment, while the control groups receive no treatment (a placebo in medical settings). If these groups have significant size and are randomly assigned, then one can attribute the difference results to the treatment. This technique is often used in the tech industry to test out new product features or marketing campaigns.

Episode 16

Caught red-handed. At least they used unit tests (from Start-Up episode 16)

At the beginning of the episode, Nam Do-san notices that the twins’ version control usernames contain words that were deciphered from the ransomware. In software development, version control software is used to handle different new updates to a code base while keeping a full historical backup of all previous versions. Version control services such as Github, the site that the twins were using, often also act as a repository for all code a developer has written as a form of online portfolio. For your career’s sake, you would definitely not want this portfolio to get linked to the perpetrator of a malicious hack.

Side note: Is it unrealistic that the Samsan Tech guys are world-class AI developers as well as cybersecurity wizards? Probably, but I’ll try not to dwell on it.

General thoughts

Despite some minor nitpicks, I felt that overall the incorporation of machine learning throughout the story was fitting and the effort on putting these details into the show was well worth it. We all get to learn a bit about AI while we watch AI developers learn about humans. All in all, it’s a tale of learning and improving, not too different from ML algorithms. I appreciated Park Hye-ryun’s rendition of the rags to riches story told through the lens of an AI startup. I enjoyed the personal growth the characters underwent through the hardships they faced; as they say, all’s fair in love and entrepreneurship. Though it all worked out in the end and they had a safety net along the way, the stakes felt real and their success felt genuine. I might be missing the point on why people liked the story, but to me the most satisfying parts of the conclusion were the changes each character experienced. Sometimes it takes a person to influence you to change; sometimes, a group of people; always, it takes hard work.

--

--