Structured Chemistry Reasoning with Large Language Models

This project introduce StructChem, a simple yet effective prompting strategy that offers the desired guidance and substantially boosts the LLMs' chemical reasoning capability.

As shown in the above figure, a complex problem requires not only understanding individual concepts (e.g., molecule property) as in previous tasks, but rather their rich dynamic interactions in different contexts, involving extensive domain knowledge (e.g., chemical formulae), precise scientific computing, and compositional step-by-step reasoning.

Methodology

Formulae Generation:
- Formulae serve as organized and abstracted representations of chemistry knowledge. When humans tackle intricate problems, the initial phase often involves seeking relevant knowledge as a foundation.
- LLMs have indeed encoded much chemistry knowledge, it is often effective to elicit the knowledge from the parametric storage.
- We instruct the LLM not only to recite them but also to provide explanations for the variables they contain.
Step-by-step Reasoning:
- Grounded on the generated formulae, the LLMs can then reason about the solution to the original question.
- To induce LLMs for more precise reasoning and calculation processes, we adopt program-of-thoughts (PoT).

Confidence-based Review-and-Refinement:
- The generated formulae and step-by-step reasoning are not always error-free. The cumulative errors in the formulae generation or step-by-step reasoning process can amplify and propagate throughout the entire generation, leading to wrong answers.
- We estimate a confidence score on the revision process. Only a high-confidence revision is accepted for further refinement in the next iteration.

Experimental Results

Our proposed method consistently outperforms baseline methods by achieving an average of 30% improvement.
StructChem works on both GPT-3,5 and GPT-4. The performance improvement on few-shot setting is even larger.
StructChem achieves substantial performance gains in complex problems with extensive reasoning steps.

Teach smaller open-sourced models how to reason: (1) Chemistry problems generated by GPT-4 as input; (2) Reasoning processes generated by StructChem as output
StructChem achieves huge improvement over baselines, which validates the high quality of our generated reasoning process.

Analysis

Both "structured instruction" and "iterative review and refinement" are significant in contributing to the performance of StructChem for zero-shot and few-shot settings.
While iterative refinement indeed contributes to the performance, our strategy of structured instruction is strong enough and demonstrates comparative performance with strong baselines such as CoT.

StructChem are more likely to generate irrelevant formulae than inaccurate ones.
Formulae being relevant probably is more important than being correct.
Complex reasoning ability is still the bottleneck of LLMs.
Preciseness is important for solving complex chemistry problems.

BibTeX

@misc{ouyang2024structured,
      title={Structured Chemistry Reasoning with Large Language Models}, 
      author={Siru Ouyang and Zhuosheng Zhang and Bing Yan and Xuan Liu and Yejin Choi and Jiawei Han and Lianhui Qin},
      year={2024},
      eprint={2311.09656},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

StructChem