“Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas” by Jake Mendel, Max Nadeau, Peter Favaloro

EA Forum Podcast (All audio) - En podcast av EA Forum Team

Kategorier:

This is a link post. Open Philanthropy is launching a big new Request for Proposals for technical AI safety research, with plans to fund roughly $40M in grants over the next 5 months, and available funding for substantially more depending on application quality. Applications (here) start with a simple 300 word expression of interest and are open until April 15, 2025. Overview We're seeking proposals across 21 different research areas, organized into five broad categories: Adversarial Machine Learning *Jailbreaks and unintentional misalignment *Control evaluations *Backdoors and other alignment stress tests *Alternatives to adversarial training Robust unlearning Exploring sophisticated misbehavior of LLMs *Experiments on alignment faking *Encoded reasoning in CoT and inter-model communication Black-box LLM psychology Evaluating whether models can hide dangerous behaviors Reward hacking of human oversight Model transparency Applications of white-box techniques Activation monitoring Finding feature representations Toy models for interpretability Externalizing reasoning Interpretability [...] --- First published: February 6th, 2025 Source: https://forum.effectivealtruism.org/posts/XtgDaunRKtCPzyCWg/open-philanthropy-technical-ai-safety-rfp-usd40m-available --- Narrated by TYPE III AUDIO.

Visit the podcast's native language site