Counterfactual reasoning benchmark testing LLMs on hypothetical scenarios and their logical implications.