In a recent conversation with Ternary Data, AI pioneer Andrew Ng emphasized a significant shift in the world of artificial intelligence. If you’ve been focused on building smarter AI models, it's time to rethink your strategy. According to Ng, the future of AI isn’t just about better models—it’s about better data.
Here are the key takeaways from Ng’s talk, and why they matter for anyone studying data analytics, business analytics, data science, or management information systems (MIS).
For years, AI development focused on improving models. Practitioners would download data and pour energy into tweaking the math. While that approach has worked, Ng argues that improving the data often leads to more impactful results. In the data-centric AI approach, clean, curated, and high-quality data becomes the priority. Models are still important, but well-organized data can boost AI effectiveness far more.
Ng is clear: data engineering is now mission-critical. AI teams today spend over half of their time wrestling with data-related issues. From finding the right datasets to organizing them for effective use, data engineers are playing an increasingly vital role. As more companies look to deploy AI, there’s a growing need for professionals who understand how to architect and manage data pipelines.
Even with all the buzz around generative AI, Ng points out that it’s still heavily reliant on data. Sure, we talk about training huge models, but behind the scenes, the real work is figuring out how to feed these models the right data. Without solid data, even the most powerful AI models can stumble.
One size doesn’t fit all when it comes to data. Sometimes, small amounts of high-quality data outperform large, messy datasets. It’s not just about gathering massive amounts of data, but rather ensuring that the data you do have is clean, relevant, and usable.
One of the most interesting innovations is the use of synthetic data. Ng shares an example where AI models were trained using puzzles created by earlier versions of the model. This shows how synthetic data, when generated thoughtfully, can accelerate AI development and fill in data gaps.
So, what does all of this mean for students in data analytics, business analytics, data science, or MIS programs? Let’s break it down:
1. Data Engineering is a Hot Career Path
As the demand for clean, structured data grows, data engineering skills are becoming invaluable. This includes mastering cloud services, understanding database systems, and learning how to design scalable data infrastructures. If you can handle complex data pipelines, you’ll find yourself in high demand.
2. Learn to Work with Data, Not Just Models
While learning how to build AI models is important, curating and improving data should be your top priority. If you can perform detailed error analysis and figure out where data quality falls short, you’ll have a major edge in the AI industry.
3. Master Synthetic Data Creation
In industries where real data is scarce, synthetic data can be a lifesaver. Students who understand how to generate and use synthetic data will find more opportunities, especially in cutting-edge AI projects.
4. Prioritize Quality Over Quantity
It’s easy to think “more data is better,” but in reality, high-quality data is more important. Learn how to curate, clean, and label data properly—these skills will be critical in building successful AI systems.
As Ng highlighted in the talk, data-centric AI is where the industry is heading, and students who master the art of data engineering, curation, and management will be leading the charge. Whether you’re in data science, business analytics, or MIS, focusing on data infrastructure will prepare you for an exciting future in AI.
Want to dive deeper? Check out Andrew Ng’s full conversation on data-centric AI here.