AI and Crypto Assets: The Emerging Industry Chain Pattern Under Technological Innovation

2025-08-13 21:57:22

AI x Crypto: From Zero to Peak

Introduction

The recent developments in the artificial intelligence industry are viewed by some as the Fourth Industrial Revolution. The emergence of large language models has significantly improved efficiency across various industries, estimated to have enhanced work efficiency in the United States by approximately 20%. Meanwhile, the generalization capabilities of large models are considered a new software design paradigm; compared to the precise code design of the past, modern software is more about embedding generalized large model frameworks to support a broader range of modal inputs and outputs. Deep learning technology has indeed brought a new wave of prosperity to the AI industry, and this wave has also impacted the cryptocurrency sector.

In this report, we will explore in detail the development history of the AI industry, the classification of technologies, and the impact of deep learning technology on the industry. We will then analyze the current status and trends of the upstream and downstream of the industrial chain in deep learning, including GPUs, cloud computing, data sources, and edge devices. Finally, we will fundamentally explore the relationship between the Crypto and AI industries, outlining the structure of the Crypto-related AI industrial chain.

The Development History of the AI Industry

The AI industry began in the 1950s. To realize the vision of artificial intelligence, academia and industry have developed various schools of thought for achieving artificial intelligence under different disciplinary backgrounds in different eras.

Modern artificial intelligence technology mainly uses the term "machine learning", which is the concept of allowing machines to iteratively improve system performance in tasks based on data. The main steps are to feed data into algorithms, train models with this data, test and deploy the models, and use the models to complete automated prediction tasks.

Currently, there are three main schools of thought in machine learning: connectionism, symbolicism, and behaviorism, which respectively mimic the human nervous system, thinking, and behavior.

Currently, connectionism represented by neural networks is in the ascendancy (, also known as deep learning ). The main reason is that this architecture has one input layer, one output layer, but multiple hidden layers. Once the number of layers and neurons ( and parameters ) is sufficient, there are enough opportunities to fit complex general tasks. By inputting data, the parameters of the neurons can be continuously adjusted. After multiple data inputs, the neuron will reach an optimal state ( parameters ), which is also the origin of the term "deep"—sufficient layers and neurons.

For example, it can be simply understood as constructing a function. When we input X=2, Y=3; and X=3, Y=5, if we want this function to handle all X values, we need to continually add the degrees of this function and its parameters. For instance, a function that meets this condition could be Y = 2X - 1. However, if there is a data point where X=2, Y=11, we need to reconstruct a function suitable for these three data points. Using GPU for brute force, we find that Y = X² - 3X + 5 is more appropriate, but it doesn't need to match the data exactly; it just needs to maintain balance and provide a roughly similar output. Here, X², X, and X⁰ represent different neurons, while 1, -3, and 5 are their parameters.

At this time, if we input a large amount of data into the neural network, we can increase the neurons and iterate parameters to fit the new data. This way, we can fit all the data.

The deep learning technology based on neural networks has also gone through multiple technical iterations and evolutions, from the earliest neural networks, feedforward neural networks, RNNs, CNNs, GANs, to the modern large models such as GPT that use Transformer technology. The Transformer technology is just one evolutionary direction of neural networks, adding a converter ( Transformer ), which is used to encode data from all modalities ( such as audio, video, images, etc. ) into corresponding numerical values for representation. Then, it is inputted into the neural network, allowing the neural network to fit any type of data, thus achieving multimodality.

AI development has undergone three technological waves. The first wave occurred in the 1960s, a decade after AI technology was proposed. This wave was driven by the development of symbolic technology, which addressed issues related to general natural language processing and human-computer dialogue. During the same period, expert systems were born, exemplified by the DENRAL expert system, which was completed under the supervision of NASA at a university in the United States. This system possesses very strong knowledge in chemistry and infers answers similar to those of a chemistry expert through questioning. This chemistry expert system can be seen as a combination of a chemistry knowledge base and an inference system.

After expert systems, in the 1990s, Israeli-American scientist and philosopher Judea Pearl proposed Bayesian networks, which are also known as belief networks. During the same period, Brooks introduced behavior-based robotics, marking the birth of behaviorism.

In 1997, an international chess program from a technology company defeated chess champion Garry Kasparov with a score of 3.5:2.5, and this victory was seen as a milestone for artificial intelligence, marking the peak of the second wave of AI development.

The third wave of AI technology occurred in 2006. The three giants of deep learning, Yann LeCun, Geoffrey Hinton, and Yoshua Bengio, proposed the concept of deep learning, an algorithm that uses artificial neural networks as its architecture to perform representation learning on data. Subsequently, deep learning algorithms gradually evolved, from RNNs and GANs to Transformers and Stable Diffusion, with these two algorithms jointly shaping this third wave of technology, marking the peak of connectionism.

Many iconic events have gradually emerged alongside the exploration and evolution of deep learning technology, including:

In 2011, an artificial intelligence system defeated humans and won the championship in the quiz show "Dangerous Edge."
In 2014, Goodfellow proposed GAN(, a Generative Adversarial Network), which learns by letting two neural networks compete against each other, capable of generating photorealistic images. At the same time, Goodfellow also wrote a book titled "Deep Learning," which became one of the important introductory books in the field of deep learning.
In 2015, Hinton and others proposed deep learning algorithms, which immediately caused a huge response in both academia and industry.
In 2015, OpenAI was founded, receiving a joint investment of $1 billion from several well-known investors.
In 2016, AlphaGo, based on deep learning technology, competed against Go world champion and professional 9-dan player Lee Sedol in a man-machine Go battle, winning with a total score of 4 to 1.
In 2017, a humanoid robot named Sophia developed by a technology company was granted citizenship, possessing a wide range of facial expressions and the ability to understand human language.
In 2017, Google published the paper "Attention is all you need" which introduced the Transformer algorithm, marking the beginning of large-scale language models.
In 2018, OpenAI released GPT, built on the Transformer algorithm, which was one of the largest language models at the time.
In 2018, DeepMind released AlphaFold, which is based on deep learning and can predict protein structures, regarded as a significant advancement in the field of artificial intelligence.
In 2019, OpenAI released GPT-2, which has 1.5 billion parameters.
In 2020, OpenAI developed GPT-3, which has 175 billion parameters, 100 times more than the previous version GPT-2. The model was trained on 570GB of text and can achieve state-of-the-art performance on various NLP tasks.
In 2021, OpenAI released GPT-4, which has 1.76 trillion parameters, ten times that of GPT-3.
The ChatGPT application based on the GPT-4 model was launched in January 2023 and reached one hundred million users in March, becoming the fastest application in history to reach one hundred million users.
In 2024, OpenAI will launch GPT-4 omni.

Deep Learning Industry Chain

Current large model languages are all based on deep learning methods using neural networks. Led by GPT, large models have created a wave of artificial intelligence enthusiasm, attracting a large number of players into this field. We have also found a significant surge in market demand for data and computing power. Therefore, in this section of the report, we mainly explore the industrial chain of deep learning algorithms. In the AI industry dominated by deep learning algorithms, how are its upstream and downstream composed, and what is the current situation of the upstream and downstream, as well as the supply and demand relationship and future development?

First, we need to clarify that when training large models based on Transformer technology, led by GPT and other LLMs(, it is divided into three steps.

Before training, since it is based on Transformer, the converter needs to convert text input into numerical values, a process known as "Tokenization". After that, these numerical values are referred to as Tokens. Under general rules of thumb, an English word or character can be roughly considered as one Token, while each Chinese character can be roughly regarded as two Tokens. This is also the basic unit used for GPT pricing.

Step One, Pre-training. By providing enough data pairs to the input layer, similar to the examples given in the first part of the report, such as )X,Y(, we seek the optimal parameters for each neuron in the model. This stage requires a large amount of data and is also the most computationally intensive process, as it involves repeatedly iterating over the neurons to try various parameters. After training a batch of data pairs, the same batch of data is generally used for secondary training to iterate the parameters.

Step two, fine-tuning. Fine-tuning involves training on a smaller batch of high-quality data, which can significantly enhance the quality of the model's output. Pre-training requires a large amount of data, but much of it may contain errors or be of low quality. The fine-tuning step can improve the model's quality through high-quality data.

Step 3, reinforcement learning. First, a brand new model will be established, which we call the "reward model". The purpose of this model is very simple, which is to rank the output results, so implementing this model will be relatively simple, as the business scenario is quite vertical. Then, this model will be used to determine whether the output of our large model is of high quality, allowing us to use a reward model to automatically iterate the parameters of the large model. ) However, sometimes human involvement is also needed to assess the quality of the model's output (.

In short, during the training process of large models, pre-training has very high requirements for the amount of data, and the GPU computing power needed is also the most. Fine-tuning requires higher quality data to improve parameters, and reinforcement learning can iteratively adjust parameters through a reward model to produce higher quality results.

During the training process, the more parameters there are, the higher the ceiling of its generalization ability. For example, in the case of the function Y = aX + b, there are actually two neurons, X and X0. Therefore, the way parameters change, the data it can fit is extremely limited because it essentially remains a straight line. If there are more neurons, then more parameters can be iterated, allowing for fitting more data. This is why large models achieve miraculous results, and this is also the reason why they are commonly referred to as large models. The essence is a massive number of neurons and parameters, along with a vast amount of data, which also requires a huge amount of computing power.

Therefore, the performance of large models is mainly determined by three aspects: the number of parameters, the amount and quality of data, and computing power. These three factors together influence the result quality and generalization ability of large models. We assume the number of parameters is p, and the amount of data is n) calculated based on the number of tokens(. Then we can estimate the required computational resources using general rules of thumb, which allows us to roughly estimate the computing power we need to purchase and the training time.

Computing power is generally measured in Flops, which represents a single floating-point operation. Floating-point operations refer to the general term for non-integer numerical addition, subtraction, multiplication, and division, such as 2.5 + 3.557. The term 'floating-point' indicates the ability to represent numbers with decimal points, while FP16 represents precision that supports decimals, and FP32 is a more commonly seen precision. Based on empirical rules from practice, pre-training ) Pre-training ( is generally trained multiple times ) on large models and requires approximately 6np Flops, where 6 is known as an industry constant. Inference ( is the process where we input data and wait for the output from the large model ), which is divided into two parts: inputting n tokens and outputting n tokens, thus requiring about 2np Flops in total.

In the early days, CPU chips were used for training to provide computational power support, but gradually GPUs started to replace them, such as Nvidia's A100 and H100 chips. This is because CPUs exist for general-purpose computing, while GPUs can serve as specialized hardware.

GPT0.64%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

8 Likes