Logo StructChem

Structured Chemistry Reasoning with Large Language Models

1University of Illinois Urbana-Champaign, 2Shanghai Jiao Tong University 3New York University, 4University of Washington, 5Allen Institute for AI, 6University of California San-Diego

This project introduce StructChem, a simple yet effective prompting strategy that offers the desired guidance and substantially boosts the LLMs' chemical reasoning capability.

As shown in the above figure, a complex problem requires not only understanding individual concepts (e.g., molecule property) as in previous tasks, but rather their rich dynamic interactions in different contexts, involving extensive domain knowledge (e.g., chemical formulae), precise scientific computing, and compositional step-by-step reasoning.

Methodology

  • Formulae Generation:
    • Formulae serve as organized and abstracted representations of chemistry knowledge. When humans tackle intricate problems, the initial phase often involves seeking relevant knowledge as a foundation.
    • LLMs have indeed encoded much chemistry knowledge, it is often effective to elicit the knowledge from the parametric storage.
    • We instruct the LLM not only to recite them but also to provide explanations for the variables they contain.
  • Step-by-step Reasoning:
    • Grounded on the generated formulae, the LLMs can then reason about the solution to the original question.
    • To induce LLMs for more precise reasoning and calculation processes, we adopt program-of-thoughts (PoT).
  • Confidence-based Review-and-Refinement:
    • The generated formulae and step-by-step reasoning are not always error-free. The cumulative errors in the formulae generation or step-by-step reasoning process can amplify and propagate throughout the entire generation, leading to wrong answers.
    • We estimate a confidence score on the revision process. Only a high-confidence revision is accepted for further refinement in the next iteration.

Experimental Results

  • Our proposed method consistently outperforms baseline methods by achieving an average of 30% improvement.
  • StructChem works on both GPT-3,5 and GPT-4. The performance improvement on few-shot setting is even larger.
  • StructChem achieves substantial performance gains in complex problems with extensive reasoning steps.

  • Teach smaller open-sourced models how to reason: (1) Chemistry problems generated by GPT-4 as input; (2) Reasoning processes generated by StructChem as output
  • StructChem achieves huge improvement over baselines, which validates the high quality of our generated reasoning process.

Analysis

  • Both "structured instruction" and "iterative review and refinement" are significant in contributing to the performance of StructChem for zero-shot and few-shot settings.
  • While iterative refinement indeed contributes to the performance, our strategy of structured instruction is strong enough and demonstrates comparative performance with strong baselines such as CoT.

  • StructChem are more likely to generate irrelevant formulae than inaccurate ones.
  • Formulae being relevant probably is more important than being correct.
  • Complex reasoning ability is still the bottleneck of LLMs.
  • Preciseness is important for solving complex chemistry problems.

Examples

<<<<<<< HEAD error examples ======= error examples >>>>>>> 014f7666a749cd83c91e988cd4a55ee59770a34f

BibTeX

@misc{ouyang2024structured,
      title={Structured Chemistry Reasoning with Large Language Models}, 
      author={Siru Ouyang and Zhuosheng Zhang and Bing Yan and Xuan Liu and Yejin Choi and Jiawei Han and Lianhui Qin},
      year={2024},
      eprint={2311.09656},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}