Google’s reCAPTCHA service is marketed as a means to protect websites from bots. If the system suspects a bot is trying to access a site, it will put up some test that only humans should be able to pass. If you spend enough time on the internet you will have seen a version of this service before. A panel of images comes up and you have to select all the images that contain a fire hydrant, or a car or bridge. We’ve all encountered this system before. If you have interacted with this system before while trying to get access to your favourite website, congratulations you have contributed to some Google machine learning model by labelling some data for them. Deep inside Google’s reCAPTCHA webpages, this is what the company says about the use of data captured from this system:
reCAPTCHA also makes positive use of the human effort spent in solving CAPTCHAs by using the solutions to digitize text, annotate images, and build machine-learning datasets. This in turn helps preserve books, improve maps, and solve hard AI problems.
In a nutshell, supervised machine learning models are attempting to classify data based on the learning of patterns, or features, that characterise the different classes. To do this, a supervised machine learning model is supplied with a lot of labelled data, called training data. Labelled data is data that comes with a tag identifying the class. A supervised ML algorithm will learn the features that are associated with a class so it can classify new data
Read the full article on towards data science