top of page
Search

Jongi - it all starts with data !

  • Aimee Newman
  • Nov 23, 2020
  • 2 min read

Artificial intelligence (AI) models are only as smart as the datasets they process. That is why it's important that the data consumed by these AI models accurately represent the target being modeled.


Humans are good at making sense of the real world and referencing objects by their names - or labels. Computers on the other hand, need to be instructed to differentiate classes of data. Humans must therefore classify data for the computer before it can be used to train AI models.


Data annotation is that process of labelling data. Labelling data, adds meaning to that data and is a crucial step before training an AI model. Labels represent classes in a training dataset and the process of annotation involves attaching meaningful class label to portions of data for the machine to reference while training. Once annotated, datasets can be used to train models used in chatbots, autonomous vehicles and natural language processing systems.


Data comes in various forms, including text, audio, images and video. Annotation use cases include labeling good and defective parts in production line images or cars on a highway in a traffic video feed. One might annotate the data by, for example, drawing rectangles around good parts and rectangles around a defective parts in images from a production line. Each of these rectangles would demarcate good and defective parts in the image and the labels corresponding to these parts would indicate their classification.


High quality datasets reduce the amount of training data required. By some estimates, if just 10% of the data is incorrectly labelled, one might need almost double the amount of data to train a model to reach a similar level of accuracy. Noise in dataset might come from unclear labelling instructions, labeler inconsistency and unclear class definitions.


It may be evident that a new type of job has emerged to satisfy technology's appetite for labelled data. As opposed to physical production line work in the industrial economy, the AI working class has become part of a data supply chain. Not all annotation jobs however require the same level of skill. For example, an algorithm detecting cancer from CT scans would require the expertise of an experienced radiologist to annotate the data.


At Jongi Data we focus on partnering with our clients to provide cost effective and quality data annotation service. Jongi Data assists clients with annotation functions across a range of data types which include text, image, video and audio datasets. We aim to tailor solutions that are best suited for training custom AI models.


Watch this space for our next segment.




ree








 
 
 

Comments


© 2020 by Nine Yard Design

bottom of page