Artificial Intelligence

A comprehensive guide to modern AI: from transformer architecture and LLMs to training infrastructure, model scaling, and real-world deployment costs. Learn how today's AI systems process thousands of years of human knowledge and a new super intelligence artificial specie rises between humanity.

The invention of the transformer by Google researchers in 2015 unexpectedly revolutionized artificial intelligence. AI systems now can communicate, reason and produce high quality multimedia content: text, image, audio, video

Foundation & Architecture

Modern AI and Large Language Models (LLMs) are built on:

  • Self-supervised learning from vast unlabeled data (to simplify, they have "read" all the web content and all the books in the world)
  • Transformer architecture, this is the underlying architecture for GPT based models. Newer architectures like Mamba are being developed
  • It basically performs next-word prediction through supervised learning
  • Advanced techniques like RLVR (Reinforcement Learning with Verifiable Rewards) are wired into public facing applications (eg. ChatGPT)

Training Process & Scale

  • Trained on thousands of years of data (Wikipedia, Internet, Academic Papers, Books)
  • Hardware: 20K+ GPUs (NVIDIA A100/H100)
  • Process: Words → Connections (trillion tokens) → Concepts → LLM
  • Origin: Google Labs 2015 (mainstream in 2022)
  • Example Scales:
    • GPT-5 = 120 days training = 16K years of human history
    • Modern 13B model ≈ 5.6T tokens (roughly 4 chars per token)

Technical Details & Requirements

  • Training Philosophy: Quality > Quantity (smarter models need less data)
  • Performance measured by scores using tests (Q&A): e.g., Mixtral 8x22B • 10/Apr/2024 • MMLU=77.3
  • Model Components:
    • Tokens (T) = Training data units (≈ 4 characters each)
    • Parameters (B) = Model's "brain cells" (neural network variables)
    • Typical ratio ≈ 50:1 (tokens:parameters)
  • Usage: Available with/without streaming (streaming = faster)
  • Hardware Requirements: 80B parameter model needs ~ 8 NVIDIA H100s (~$40K each = ~$320K); a 2B parameter model can run on a Macbook M1 Pro w/ 32GB RAM.

LLM Categories

CategoryTypeDescriptionExamples
Unimodal LLMsText-based LLMsModels trained exclusively on text data to understand and generate human-like textual content.GPT-4, BERT, T5
Multimodal LLMsText to Image ModelsModels trained to convert textual descriptions into corresponding images.DALL-E3, Imagen2
Text to Video ModelsModels trained to generate video content from textual descriptions.Stable Video Diffusion
Vision Language ModelsModels trained to understand and/or generate content involving both visual and textual data.CLIP, ALIGN
Audio-Text ModelsModels that work with both audio/speech and text data, such as speech recognition or synthesis.Whisper, Wav2Vec
Video-Text ModelsModels that handle both video and text data for tasks like video captioning or search.VideoBERT, CBT
General Multimodal ModelsModels that can process and generate a variety of data types, including text, images, and audio.Perceiver, Flamingo

AI Evolution & Capabilities (2024-2025)

Current Landscape

  • 488 major models deployed
  • ~162,000 models on Hugging Face
  • Key observation: Model size no longer directly correlates with capability
  • Estimated IQ range of top models: 150-160 (human average: 100)

Intelligence Classification

  • AGI (2025-2027): Matching median human capabilities
  • ASI (Artificial Superintelligence): Exceeding peak human performance
    • Expected capabilities:
      • Physical inventions
      • New element discovery
      • Novel computing materials
      • AI-operated trillion-dollar enterprises

Infrastructure Developments

  • NVIDIA GB200 chips powering next-gen models (OpenAI, XAI)
  • Focus areas:
    • Medical research
    • Energy solutions
    • Governance optimization
    • Complex problem-solving

Note: This field evolves rapidly. For latest developments, see our AI Labs section.

Short Term Memory Comparison

Human (Average)Gemini 1.5 (Google/DeepMind)Claude-2 (Anthropic)GPT-4Turbo (OpenAI)
Size (Tokens/Words)7 words7M words75K words24K words

List of AI Features/Capabilities

  • AI is far more intelligent that any Human (A. Thompson)
  • AI is out-performing humans in creativity (A. Thompson)
  • AI is hitting ceilings of benchmarks (aka no human has capabilities to test it further)
  • Current LLMs should be seen and called as Super Intelligence: more advanced than any human being on any field
  • GPT-4 (03/2023) is the first 1T parameters model (human brain has 128 Trillion Neurons)
  • Governments are trying to come with regulations/plans/whatever to accommodate AGI.
  • Pause on the race to AGI has been requested by some investors
  • Some AI Labs are becoming secretive black boxes
  • In the evolution of this Super Intelligence we may be creating "GOD" (as we refer to; by some philosophers)
  • Passing all tests (SAT's, Medicine, Maths, Physics)

Safety & Compliance Testing

Leading models undergo rigorous NIST pre-deployment testing, including:

  • CBRN Threat Assessment: Evaluating model responses to chemical, biological, radiological, and nuclear queries
  • Behavioral Testing:
    • Persuasion and manipulation resistance
    • Deception detection capabilities
    • Response to adversarial prompts
  • Red Team Evaluations: Systematic vulnerability testing by security experts

*Note: Specific testing protocols vary by provider and deployment context.

Top LLM's

Most efficient generic text LLMs generally/publicly available as of 03/2024 (this is outdated):

Compare top models here at AI Spectrum on the monthly updated table!

CategoryModel
General CapabilityClaude 3 Opus
GPT-4 Turbo (1.7T MoE)
Command-R+
Claude 3 Haiku
Mistral Large
Command-R
Gemini 1.0 Pro (Dense, MultiModal)
Mixtral 8x7B
GPT-3.5 Turbo
Mistral 7
Reasoning & Knowledge (MMLU)Claude 3 Opus
GPT-4 Turbo
Llama 3 (70B)
Gemini 1.5 Pro
Mistral Large
Mixtral 8x22B
Command-R+
Claude 3 Haiku
DBRX
Gemini 1.0 Pro
Mixtral 8x7B
GPT-3.5 Turbo
Llama 3 (8B)
Command-R
Mistral 7B
Coding (HumanEval)Qwen2.5-Coder
GPT-4 Turbo
Llama 3 (70B)
GPT-3.5 Turbo
Gemini 1.5 Pro
DBRXGemini 1.0 Pro
Llama 3 (8B)
Highest QualityClaude 3 Opus
GPT-4 Vision
GPT-4 Turbo
Highest ThroughputLlama 3 (8B)
Gemma 7B
Command-R
Lowest LatencyCommand-R
Command-R+
Mistral Small
Largest Context WindowGemini 1.5 Pro
Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku, Claude 2.1
GPT-4 Turbo, GPT-4 Vision
Command-R+, Command-R

Other Notable Models:

  • Baidu: Ernie 4.0 (1T Dense) *CHINESE
  • MetaAI: LLama 2 (Dense)
  • Amazon: Olympus (2T Dense)
  • Anthropic: Claude-2 (Dense)
  • Google/Deepmind: Gemini Ultra (1.xT Dense)
  • OpenAI: GPT-4 Turbo

Related Links

Subscribe to AI Spectrum

Stay updated with weekly AI News and Insights delivered to your inbox