If you’re in the tech industry (and probably aren’t), you’ve heard many About AI.I’m not just talking about “skynet takes over the earthIt’s the kind of sci-fi AI we’ve enjoyed for years, but practical applications of artificial intelligence and machine learning in our everyday lives.
Big data is the lifeblood and underpinning of AI/ML. huge amount of data. lots of data. or is it? Big data is the engine that powers today’s AI/ML, and it can always be a huge amount, but organizations have started to change in recent years. From big data to small and wide.
Let’s compare the two.
lots of data
Big data can be divided into two methods.
The first is collecting and organizing large datasets. This is a simple concept, but it can be difficult to execute well. This process requires the rapid input of large amounts of typically unstructured data. The back-end infrastructure to serve this data stream is resource-intensive, requiring network bandwidth, storage space, and processing power to support large-scale database deployments. And it’s expensive.
The second method is trickier. Once you have a wealth of data, you need to extract insight and value from it. Technology has evolved to accommodate the size of big data, but less progress has been made in determining what can be deduced from these mountains of information.
This is the time to get smarter. Even in environments with infinite storage space and perfect NoSQL deployments, all the data in the world is meaningless without the right model to match.
There is also an opportunity here. Companies are finding more practical use cases for less data from more sources, and are drawing better conclusions and correlations from data sets.
small and wide
The small and broad approach does not just increase the amount of raw material, but looks at a wider variety of sources to look for correlations. This more tactical approach requires less data and less computing resources. Diversity is the name of the game, and small scale and wide deployment means looking for diverse data formats, both structured and unstructured, and finding the links between them.
according to 2021 Gartner Report: “Areas where small and broad data could be used are demand forecasting in retail, real-time behavioral and emotional intelligence in customer service applied to hyper-personalization, and improving the customer experience. “
There are many possibilities, but what does this look like in practice? Large datasets can quickly become unwieldy or stale. Human tendencies and behaviors can have a significant impact in the Information Age, which is prone to cultural and economic change. There is room for more agile models using small datasets that can dynamically adapt to these changes.
report from Harvard Business Review to explain “Many of an organization’s most valuable data sets are very small. Think kilobytes or megabytes, not exabytes. This data is often overlooked because it lacks the volume and velocity of big data. They are stuck in PCs and feature databases, and are not tied to enterprise-wide IT innovation initiatives.”
The report describes experiments they conducted with a medical coder, highlighting the human factor in training AI on small data. We encourage you to read this study, but the bottom line is that taking into account the human element, in addition to small data, will improve your models and give organizations a competitive edge in the big data arms race. It means that you can give them priority.
In other words, small, wide, smart Data as a winning combination.
draw a conclusion
What does this mean? Many books could and have been written on the subject, but let’s take a quick holistic look at the takeaway message. We love powerful PCs, but there comes a time when “more” is the limit. Even with a top-of-the-line workstation, some pieces of software may not be optimized and run badly.
It is often unrealistic to devote more resources to the problem and overlooks the real problem. There is often a golden opportunity for improvement, and this is what we see today with big data. There are still use cases where huge amounts of data are really needed, but it is important not only to design methods to make the most of the data, but also to design models to make the most of the data. .