Artificial Intelligence
A comprehensive guide to modern AI: from transformer architecture and LLMs to training infrastructure, model scaling, and real-world deployment costs. Learn how today's AI systems process thousands of years of human knowledge and a new super intelligence artificial specie rises between humanity.
The invention of the transformer by Google researchers in 2015 unexpectedly revolutionized artificial intelligence. AI systems now can communicate, reason and produce high quality multimedia content: text, image, audio, video
Foundation & Architecture
Modern AI and Large Language Models (LLMs) are built on:
- Self-supervised learning from vast unlabeled data (to simplify, they have "read" all the web content and all the books in the world)
- Transformer architecture, this is the underlying architecture for GPT based models. Newer architectures like Mamba are being developed
- It basically performs next-word prediction through supervised learning
- Advanced techniques like RLVR (Reinforcement Learning with Verifiable Rewards) are wired into public facing applications (eg. ChatGPT)
Training Process & Scale
- Trained on thousands of years of data (Wikipedia, Internet, Academic Papers, Books)
- Hardware: 20K+ GPUs (NVIDIA A100/H100)
- Process: Words → Connections (trillion tokens) → Concepts → LLM
- Origin: Google Labs 2015 (mainstream in 2022)
- Example Scales:
- GPT-5 = 120 days training = 16K years of human history
- Modern 13B model ≈ 5.6T tokens (roughly 4 chars per token)
Technical Details & Requirements
- Training Philosophy: Quality > Quantity (smarter models need less data)
- Performance measured by scores using tests (Q&A): e.g., Mixtral 8x22B • 10/Apr/2024 • MMLU=77.3
- Model Components:
- Tokens (T) = Training data units (≈ 4 characters each)
- Parameters (B) = Model's "brain cells" (neural network variables)
- Typical ratio ≈ 50:1 (tokens:parameters)
- Usage: Available with/without streaming (streaming = faster)
- Hardware Requirements: 80B parameter model needs ~ 8 NVIDIA H100s (~$40K each = ~$320K); a 2B parameter model can run on a Macbook M1 Pro w/ 32GB RAM.
LLM Categories
Category | Type | Description | Examples |
---|---|---|---|
Unimodal LLMs | Text-based LLMs | Models trained exclusively on text data to understand and generate human-like textual content. | GPT-4, BERT, T5 |
Multimodal LLMs | Text to Image Models | Models trained to convert textual descriptions into corresponding images. | DALL-E3, Imagen2 |
Text to Video Models | Models trained to generate video content from textual descriptions. | Stable Video Diffusion | |
Vision Language Models | Models trained to understand and/or generate content involving both visual and textual data. | CLIP, ALIGN | |
Audio-Text Models | Models that work with both audio/speech and text data, such as speech recognition or synthesis. | Whisper, Wav2Vec | |
Video-Text Models | Models that handle both video and text data for tasks like video captioning or search. | VideoBERT, CBT | |
General Multimodal Models | Models that can process and generate a variety of data types, including text, images, and audio. | Perceiver, Flamingo |
AI Evolution & Capabilities (2024-2025)
Current Landscape
- 488 major models deployed
- ~162,000 models on Hugging Face
- Key observation: Model size no longer directly correlates with capability
- Estimated IQ range of top models: 150-160 (human average: 100)
Intelligence Classification
- AGI (2025-2027): Matching median human capabilities
- ASI (Artificial Superintelligence): Exceeding peak human performance
- Expected capabilities:
- Physical inventions
- New element discovery
- Novel computing materials
- AI-operated trillion-dollar enterprises
- Expected capabilities:
Infrastructure Developments
- NVIDIA GB200 chips powering next-gen models (OpenAI, XAI)
- Focus areas:
- Medical research
- Energy solutions
- Governance optimization
- Complex problem-solving
Note: This field evolves rapidly. For latest developments, see our AI Labs section.
Short Term Memory Comparison
Human (Average) | Gemini 1.5 (Google/DeepMind) | Claude-2 (Anthropic) | GPT-4Turbo (OpenAI) | |
---|---|---|---|---|
Size (Tokens/Words) | 7 words | 7M words | 75K words | 24K words |
List of AI Features/Capabilities
- AI is far more intelligent that any Human (A. Thompson)
- AI is out-performing humans in creativity (A. Thompson)
- AI is hitting ceilings of benchmarks (aka no human has capabilities to test it further)
- Current LLMs should be seen and called as Super Intelligence: more advanced than any human being on any field
- GPT-4 (03/2023) is the first 1T parameters model (human brain has 128 Trillion Neurons)
- Governments are trying to come with regulations/plans/whatever to accommodate AGI.
- Pause on the race to AGI has been requested by some investors
- Some AI Labs are becoming secretive black boxes
- In the evolution of this Super Intelligence we may be creating "GOD" (as we refer to; by some philosophers)
- Passing all tests (SAT's, Medicine, Maths, Physics)
Safety & Compliance Testing
Leading models undergo rigorous NIST pre-deployment testing, including:
- CBRN Threat Assessment: Evaluating model responses to chemical, biological, radiological, and nuclear queries
- Behavioral Testing:
- Persuasion and manipulation resistance
- Deception detection capabilities
- Response to adversarial prompts
- Red Team Evaluations: Systematic vulnerability testing by security experts
*Note: Specific testing protocols vary by provider and deployment context.
Top LLM's
Most efficient generic text LLMs generally/publicly available as of 03/2024 (this is outdated):
Compare top models here at AI Spectrum on the monthly updated table!
Category | Model |
---|---|
General Capability | Claude 3 Opus |
GPT-4 Turbo (1.7T MoE) | |
Command-R+ | |
Claude 3 Haiku | |
Mistral Large | |
Command-R | |
Gemini 1.0 Pro (Dense, MultiModal) | |
Mixtral 8x7B | |
GPT-3.5 Turbo | |
Mistral 7 | |
Reasoning & Knowledge (MMLU) | Claude 3 Opus |
GPT-4 Turbo | |
Llama 3 (70B) | |
Gemini 1.5 Pro | |
Mistral Large | |
Mixtral 8x22B | |
Command-R+ | |
Claude 3 Haiku | |
DBRX | |
Gemini 1.0 Pro | |
Mixtral 8x7B | |
GPT-3.5 Turbo | |
Llama 3 (8B) | |
Command-R | |
Mistral 7B | |
Coding (HumanEval) | Qwen2.5-Coder |
GPT-4 Turbo | |
Llama 3 (70B) | |
GPT-3.5 Turbo | |
Gemini 1.5 Pro | |
DBRXGemini 1.0 Pro | |
Llama 3 (8B) | |
Highest Quality | Claude 3 Opus |
GPT-4 Vision | |
GPT-4 Turbo | |
Highest Throughput | Llama 3 (8B) |
Gemma 7B | |
Command-R | |
Lowest Latency | Command-R |
Command-R+ | |
Mistral Small | |
Largest Context Window | Gemini 1.5 Pro |
Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku, Claude 2.1 | |
GPT-4 Turbo, GPT-4 Vision | |
Command-R+, Command-R |
Other Notable Models:
- Baidu: Ernie 4.0 (1T Dense) *CHINESE
- MetaAI: LLama 2 (Dense)
- Amazon: Olympus (2T Dense)
- Anthropic: Claude-2 (Dense)
- Google/Deepmind: Gemini Ultra (1.xT Dense)
- OpenAI: GPT-4 Turbo
Subscribe to AI Spectrum
Stay updated with weekly AI News and Insights delivered to your inbox