AI Safety Fundamentals: Alignment
En podcast av BlueDot Impact
Kategorier:
83 Avsnitt
-
Public by Default: How We Manage Information Visibility at Get on Board
Publicerades: 2024-05-12 -
Writing, Briefly
Publicerades: 2024-05-12 -
Being the (Pareto) Best in the World
Publicerades: 2024-05-04 -
How to Succeed as an Early-Stage Researcher: The “Lean Startup” Approach
Publicerades: 2024-04-23 -
Become a Person who Actually Does Things
Publicerades: 2024-04-17 -
Planning a High-Impact Career: A Summary of Everything You Need to Know in 7 Points
Publicerades: 2024-04-16 -
Working in AI Alignment
Publicerades: 2024-04-14 -
Computing Power and the Governance of AI
Publicerades: 2024-04-07 -
AI Control: Improving Safety Despite Intentional Subversion
Publicerades: 2024-04-07 -
Emerging Processes for Frontier AI Safety
Publicerades: 2024-04-07 -
AI Watermarking Won’t Curb Disinformation
Publicerades: 2024-04-07 -
Challenges in Evaluating AI Systems
Publicerades: 2024-04-07 -
Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small
Publicerades: 2024-04-01 -
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Publicerades: 2024-03-31 -
Zoom In: An Introduction to Circuits
Publicerades: 2024-03-31 -
Weak-To-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Publicerades: 2024-03-26 -
Can We Scale Human Feedback for Complex AI Tasks?
Publicerades: 2024-03-26 -
Machine Learning for Humans: Supervised Learning
Publicerades: 2023-05-13 -
Visualizing the Deep Learning Revolution
Publicerades: 2023-05-13 -
Four Background Claims
Publicerades: 2023-05-13
Listen to resources from the AI Safety Fundamentals: Alignment course!https://aisafetyfundamentals.com/alignment