Mapping Artificial Intelligence Terrain

Data, Algorithms, Infrastructure, and Application + Opportunities

Apr 28, 2026

a computer circuit board with a brain on it — Photo by Steve A Johnson on Unsplash

Artificial intelligence, at its basic level, is the capacity of machines to perform tasks that have historically required human cognition, that is, perception, reasoning, learning, and decision-making. They generally achieve this capacity through a process wherein mathematical models are exposed to vast quantities of examples or rules, from which they statistically derive patterns that can subsequently be applied to novel inputs. The build up to this stretches back to the mid-twentieth century with Alan Turing’s speculative question of whether machines could think, followed by the Dartmouth Summer Research Project of 1956 that formally established the field. For decades, progress remained sluggish due to limited computational power and data availability, and the field oscillated between periods of symbolic logic-based systems and early neural network experiments that failed to scale.

However, in November 2022, OpenAI launched a public-facing generative AI called ChatGPT, whose launch was made possible by three things that had grown simultaneously and converged at this time in history. First, there was exponentially increasing computational capacity following Moore’s Law. Second, there existed a large dataset of digital data from the internet and connected devices. And thirdly, there were architectural breakthroughs in deep learning such as backpropagation and convolutional networks. The release of GPT-3 by OpenAI in 2020 had already illustrated that a sufficiently large Transformer model trained on diverse internet text could perform translation, summarisation, question-answering, and even code generation without task-specific fine-tuning. Yet it was the public launch of ChatGPT that truly thrust generative artificial intelligence into global consciousness. ChatGPT presented an interface through which any user could converse with a model that produced human-like responses, wrote essays, debugged software, and generated creative content. This launch shifted the perception of AI from a background analytical tool to an interactive agent whose capabilities and limitations became the subject of urgent geopolitical and economic scrutiny.

The Four Pillars of Artificial Intelligence: Data, Algorithms, Infrastructure, and Application

For governments and organisations to navigate this terrain with strategic clarity, they need a simple but complete way to see what AI actually is. I see artificial intelligence as generally four things: data, algorithms, infrastructure, and application. These four pillars operate in continuous feedback with one another, and each carries its own set of technical, economic, and governance challenges. You need to understand these challenges before you can decide what level of participation makes sense for your country or organisation.

First, data is the raw material that any intelligent system learns from. It includes everything from structured financial databases to unstructured collections of images, text, sensor logs, or genomic sequences. The quality of this data sets the ceiling on model performance, which is why issues like annotation, labelling, and cleaning are so important. Annotation is the human or automated process of adding metadata or labels to raw data, for example drawing boxes around objects in an image or tagging the part-of-speech for each word in a sentence. Labelling means assigning the target values that the model is supposed to predict. Cleaning deals with all the corruptions, inconsistencies, duplicates, and biases that pile up in real-world datasets. These tasks demand labour-intensive, expert-driven work that can take up eighty percent or more of the total effort in any serious AI project.

Second, algorithms are the mathematical architectures and training procedures that turn data into working models. These range from simple linear regressions and decision trees all the way up to the multi-billion-parameter Transformer networks that power today’s generative systems. Algorithms learn by adjusting their internal weights to reduce the error between their predictions and the correct labels, using processes like gradient descent and backpropagation. Once trained, the model can generate outputs over an interface. ChatGPT is one interface connected to OpenAI’s GPT models. Claude is Anthropic’s interface connected to its Constitutional AI-trained models. Gemini is Google’s interface. Each algorithmic choice affects interpretability, computational cost, memory footprint, how easily it can be fooled, and how often it makes things up or states facts correctly.

Third, infrastructure is the entire physical and logical stack that makes data storage, algorithm training, and model deployment possible. This starts with specialised chips like NVIDIA and AMD GPUs, Google’s TPUs, and emerging neuromorphic or photonic accelerators. It continues through the networking fabric that connects thousands of these chips into clusters that can train together in sync. Infrastructure also includes the massive databases that hold training corpora, from traditional SQL and NoSQL systems to vector databases built for embedding retrieval, plus the servers that run inference workloads. Critically, infrastructure includes water and land, which are often overlooked. Water is consumed in huge volumes for cooling data centres; a single large model training run can use millions of litres of evaporative cooling water. Land is needed for the physical footprint of data centre campuses, their dedicated power substations, and the renewable energy installations that are increasingly built alongside them to offset carbon emissions.

Fourth, application is about deploying trained algorithms, running on top of all this infrastructure and consuming data, toward real use cases that create value or solve problems. This includes scientific breakthroughs like AlphaFold predicting protein folding patterns that biologists had struggled with for decades. It includes sports, through computer vision-based refereeing systems and algorithmic analysis of player movements to enforce fairer regulations and prevent injuries. It includes entertainment, where generative AI now makes photorealistic images, composes music, assists with scriptwriting, and generates interactive stories. And it includes manufacturing, logistics, agriculture, defence, finance, healthcare, education, and government itself.

Operational Context

someone must collect it, someone must clean it, someone must label it, and those steps determine everything the model can ultimately do.

Evaluating Points of Entry

Evaluating where to enter the AI terrain requires a structured look at what you already have, what you intend to do given your sovereignty and risk tolerance, and how competitive each pillar is. Take the decision to pursue frontier model training. It demands not just hundreds of millions of dollars for a single training run, but also a sustained engineering team of several hundred researchers, access to the newest chip designs that are often booked years in advance, and a willingness to accept that your model could be obsolete within months because of rapid advances elsewhere.

A government or organisation might instead choose to enter at the data pillar. That would mean building national-scale annotated datasets in your own language, for which commercial models are poorly trained, giving you a comparative advantage in applications that depend on linguistic or cultural specifics. This approach costs less in computational power but requires systematic investment in labelling infrastructure, quality assurance procedures, and legal frameworks for data consent and compensation.

The infrastructure pillar offers entry points ranging from building and running data centres to making cooling systems or power management hardware. You could also get involved in chip design at the edges, for example making inference accelerators rather than training GPUs, or securing water rights in regions where data centre expansion is expected to strain local supplies. All these options need a lot of capital but rely less on the rare algorithmic expertise that defines the frontier.

The application pillar gives you the most entry points. Any domain-specific problem can be tackled by combining open-source algorithms with custom fine-tuning on locally relevant data, then deploying on rented cloud infrastructure. This includes climate modelling with uncertainty quantification, supply chain optimisation under disruption, early warning systems for disease outbreaks or financial crises, and adaptive educational software for personalised learning. What you need most here is domain expertise and modest engineering capacity, not foundational research breakthroughs.

What decides whether a point of entry is wise is how well it matches what your organisation already does well.

Global Inequality as a Structural Feature of AI Competition

The most serious risk in the current trajectory of artificial intelligence is that it will widen existing global inequalities across several dimensions at once. The already concentrated economic and technological power of a few nations and corporations could become self-reinforcing through data network effects, algorithmic talent clustering, infrastructure economies of scale, and first-mover advantages in deploying applications.

Countries that lack the money to build frontier models face not just a delay in access, but a structural exclusion from shaping the architectures and training methods that will set the path for generations of AI systems. Once a model like GPT-4 or its successors is trained on massive multilingual data, its internal representations encode the cultural and epistemic assumptions of the engineers and datasets that made it. This process can marginalise or erase the languages, knowledge systems, and problem-solving approaches of societies that did not take part in that training.

The infrastructure pillar shows the starkest material inequality. Advanced chips are subject to export controls that explicitly aim to stop certain nations from training frontier models. At the same time, the water and land used by large-scale AI impose costs that fall disproportionately on communities located near data centres, often low-income or indigenous populations whose resources are taken to cool the servers that generate economic value for distant shareholders.

Even inside wealthy nations, applying AI to hiring, lending, medical triage, criminal justice, and social services has repeatedly been shown to encode historical biases that penalise already marginalised groups. The opacity of state-of-the-art models makes auditing these outcomes technically difficult even for well-funded civil society organisations.

For organisations and individuals in the global periphery, having generative AI available as a service can create a dependency.

Local value generation like writing, translation, customer support, and basic coding gets replaced by API calls to models hosted in foreign data centres. This extracts value from the very cognitive tasks that used to be pathways to economic mobility, while leaving few chances to gain the deeper algorithmic literacy needed to escape that dependency.

Governments that fail to see widening inequality as a built-in feature of unconstrained AI competition, rather than a bug, will find themselves reacting to crises of job displacement, cultural erosion, and geopolitical subordination.

These crises could have been reduced through deliberate, multi-pillar strategies of building capacity, negotiating technology transfer agreements, supporting open-source model governance, and creating international infrastructure financing mechanisms like those used in the past to address unequal access to nuclear technology, satellite communications, and pharmaceutical manufacturing.

The choices made today in each of the four pillars will determine our human future.

On Second Thought by Mthokozisi Mabhena

Discussion about this post

Ready for more?