The problem Waterline Development encountered is that commercial AI models are ill-suited to multidisciplinary research, which requires synthesizing expertise from a variety of fields.
"No single AI model does this reliably," the company explains in a white paper [PDF]. "Frontier language models hallucinate under extended multi-step reasoning. They produce plausible answers that silently break when a problem crosses domain boundaries. At best this wastes time; at worst, it poisons critical decision making." //
Bednarski said Rozum is not focused on correcting LLMs to the extent they can be used for, say, critical engineering work like bridge construction. Rather, the goal is to empower researchers, engineers, and scientists so they can do their jobs better.
"We are focused on deterministic tool implementation (ex. RDKit for Chemistry), allowing engineers, scientists, and analysts a direct path to verify outputs in a format familiar to them by domain," he explained.
"Our system orchestration method is heavily focused on deterministic validation (code execution replicated, etc.) of outputs, which roots out hallucinations that plague all models at various times. We see further improvements to this in verifying the methods used in sources we cite as well."