You Cannot Build AI on Dirty Data: A Practical Guide for Australian Mid-Market Companies
Most Australian companies that try to implement AI hit the same wall: the data is not ready. Here is how to assess whether your data foundation is AI-ready, and what to do if it is not.
"We want to implement AI" is the most common brief we receive from Australian businesses. The second most common outcome, three weeks into an engagement, is: "we cannot implement AI until we fix our data."
This is not a technology failure. It is a sequencing failure. And it is preventable.
The Australian Mid-Market Data Reality
Most Australian companies between $20M and $200M revenue have data in:
- A legacy ERP (MYOB, SAP, Sage, or a custom system from 2008)
- Multiple SaaS tools (HubSpot, Xero, Salesforce, ServiceNow) that do not talk to each other
- Spreadsheets — often hundreds of them, maintained by individual staff
- Siloed databases owned by different departments with no agreed schema
- Historical archives in formats that no current system can read
This is not unusual. It is the default state of a company that has grown organically. The problem is that AI systems require clean, consistent, accessible data to function.
How to Know If Your Data Is AI-Ready
Ask yourself these questions:
1. Can you answer a single question — "What was our revenue by product line last quarter?" — from one system in under 5 minutes?
If not: your data is siloed or inconsistent. AI cannot fix that; it will amplify it.
2. Do you have a single source of truth for customer data?
If your customer's name, address, and purchase history are in three different systems with three different formats: you are not ready.
3. Can you describe the lineage of your most important dataset?
Where did it come from? What transformations has it been through? Who changed it last and when? If you cannot answer this: your data has no audit trail and AI outputs will be untrustworthy.
4. How long does it take to get new data from a source system into your analytics?
If the answer is "days" or "we have to ask IT": your pipeline infrastructure needs work before AI can run on top of it.
The Minimum Viable Data Foundation for AI
You do not need a perfect data platform. You need:
- A single, queryable data store where the data that matters for your AI use case is available. Could be a data warehouse (Snowflake, BigQuery, Redshift), a lakehouse (Databricks), or even a well-structured PostgreSQL database for smaller scales.
- A reliable pipeline that moves data from source systems into that store consistently. Ideally automated, definitely monitored.
- Basic data quality rules that catch and flag bad data before it reaches the AI system. Not perfect data — just data with known quality properties.
- Clear data ownership — who is responsible for each dataset, who can answer questions about it, who is authorised to use it.
This is a 6–12 week build for a typical mid-market Australian company. It is not glamorous, but it is the difference between AI that works and AI that does not.
The Sequencing
The right order for most Australian mid-market companies:
- AI Readiness Sprint — understand current state, identify the highest-ROI use case
- Data Foundation Build — fix the data infrastructure for that specific use case
- Agentic Workflow Build — build and deploy the AI system on the cleaned foundation
- Expand — apply the same pattern to the next use case
Trying to skip step 2 costs more in the end. Every failed AI pilot we have been asked to rescue had skipped the data foundation.
*Akira Data builds data foundations designed for AI workloads. Our Data Foundation Build engagement takes 6–12 weeks and includes the pipeline, warehouse, and quality framework your AI systems need.*
Share this article