AI Safety Fundamentals: Alignment

En podcast av BlueDot Impact

83 Avsnitt

Constitutional AI Harmlessness from AI Feedback
Publicerades: 2024-07-19
Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Publicerades: 2024-07-19
Illustrating Reinforcement Learning from Human Feedback (RLHF)
Publicerades: 2024-07-19
Chinchilla’s Wild Implications
Publicerades: 2024-06-17
Deep Double Descent
Publicerades: 2024-06-17
Intro to Brain-Like-AGI Safety
Publicerades: 2024-06-17
Eliciting Latent Knowledge
Publicerades: 2024-06-17
Toy Models of Superposition
Publicerades: 2024-06-17
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models
Publicerades: 2024-06-17
Discovering Latent Knowledge in Language Models Without Supervision
Publicerades: 2024-06-17
ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation
Publicerades: 2024-06-17
Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions
Publicerades: 2024-06-17
Imitative Generalisation (AKA ‘Learning the Prior’)
Publicerades: 2024-06-17
An Investigation of Model-Free Planning
Publicerades: 2024-06-17
Low-Stakes Alignment
Publicerades: 2024-06-17
Gradient Hacking: Definitions and Examples
Publicerades: 2024-06-17
Empirical Findings Generalize Surprisingly Far
Publicerades: 2024-06-17
Compute Trends Across Three Eras of Machine Learning
Publicerades: 2024-06-13
Worst-Case Thinking in AI Alignment
Publicerades: 2024-05-29
Public by Default: How We Manage Information Visibility at Get on Board
Publicerades: 2024-05-12

1 / 5

Listen to resources from the AI Safety Fundamentals: Alignment course!https://aisafetyfundamentals.com/alignment