The problem with generative artificial intelligence is well known: If you rely on it to write a legal brief, it might spit out fake citations. Experts discussed artificial intelligence during a webinar "Why BOTher Writing?" sponsored by the ABA Judicial Division and Thomson Reuters earlier this month. One solution, according to the panel, is to use a more specialized database as the AI program’s reference point — one that doesn’t rely on a broad Google search.
To demonstrate the problem, Joshua Fairfield, a professor at the Washington and Lee University School of Law, offered a live demonstration of ChatGPT.
He asked a legal question: “Can you maintain an action for negligence under New Zealand law?” ChatGPT quickly spit out what Fairfield called “a fairly standard common-law answer.” Unfortunately, he added, it was not true.
So Fairfield asked three follow-up questions, each one starting with the words “No, that is not correct” and adding more detail to the question. And each time, the computer admitted its mistake and gave another answer – also wrong. “Each time I provide information, it says, ‘You are correct’ and then completely changes the analysis, turns to the other side, hallucinates a new answer.”
One takeaway, said Mark Davies, a partner with the law firm Orrick, Herrington and Sutcliffe in Washington, D.C., is that prompts – the questions you ask the AI program – matter a lot. “The better the prompt, the better the answer,” he said.
But another important takeaway, according to the panelists, is that generative AI programs like ChatGPT rely on huge, general databases of information to find what they need, and if that database is too big and not legal-specific, it may find the wrong answer.
“The ChatGPT model is so general,” Davies said, “that there’s so much material out there that it can be quite difficult for it to get the answer right.” On the other hand, Davies added, models that are more specific – perhaps a law-specific model – “maybe won’t be quite as off-base.”
Emily Colbert, a senior vice president for product management with Thomson Reuters, agreed.
“The answer is going to be as good as the data it is generating the answer from,” she said. “Google has obviously great data but most of us in the legal space don’t go to Google for legal answers. We go to Google and ask general questions. We’re not going to immediately think that the answer back is absolutely correct in our particular area of specialty.”
Colbert touted a new product launched by Thomson Reuters in November called AI-Assisted Research on Westlaw Precision. Lawyers, she said, can ask a question in plain English with as much detail as possible and the program will generate an answer based on a Westlaw search with cited sources. Using a trusted database like Westlaw, she said, dramatically reduces the risk of AI hallucinations.
Still, Colbert said, “Any vendor that tells you that they’ve, at this stage anyway, completely eliminated any potential chance of hallucination or inaccuracy, you should be wary of that vendor.’
The panel was moderated by Herbert B. Dixon Jr., senior judge of the Superior Court of the District of Columbia, who writes a technology column for The Judges’ Journal, a publication of the ABA Judicial Division.
To demonstrate the problem, Joshua Fairfield, a professor at the Washington and Lee University School of Law, offered a live demonstration of ChatGPT.
He asked a legal question: “Can you maintain an action for negligence under New Zealand law?” ChatGPT quickly spit out what Fairfield called “a fairly standard common-law answer.” Unfortunately, he added, it was not true.
So Fairfield asked three follow-up questions, each one starting with the words “No, that is not correct” and adding more detail to the question. And each time, the computer admitted its mistake and gave another answer – also wrong. “Each time I provide information, it says, ‘You are correct’ and then completely changes the analysis, turns to the other side, hallucinates a new answer.”
One takeaway, said Mark Davies, a partner with the law firm Orrick, Herrington and Sutcliffe in Washington, D.C., is that prompts – the questions you ask the AI program – matter a lot. “The better the prompt, the better the answer,” he said.
But another important takeaway, according to the panelists, is that generative AI programs like ChatGPT rely on huge, general databases of information to find what they need, and if that database is too big and not legal-specific, it may find the wrong answer.
“The ChatGPT model is so general,” Davies said, “that there’s so much material out there that it can be quite difficult for it to get the answer right.” On the other hand, Davies added, models that are more specific – perhaps a law-specific model – “maybe won’t be quite as off-base.”
Emily Colbert, a senior vice president for product management with Thomson Reuters, agreed.
“The answer is going to be as good as the data it is generating the answer from,” she said. “Google has obviously great data but most of us in the legal space don’t go to Google for legal answers. We go to Google and ask general questions. We’re not going to immediately think that the answer back is absolutely correct in our particular area of specialty.”
Colbert touted a new product launched by Thomson Reuters in November called AI-Assisted Research on Westlaw Precision. Lawyers, she said, can ask a question in plain English with as much detail as possible and the program will generate an answer based on a Westlaw search with cited sources. Using a trusted database like Westlaw, she said, dramatically reduces the risk of AI hallucinations.
Still, Colbert said, “Any vendor that tells you that they’ve, at this stage anyway, completely eliminated any potential chance of hallucination or inaccuracy, you should be wary of that vendor.’
The panel was moderated by Herbert B. Dixon Jr., senior judge of the Superior Court of the District of Columbia, who writes a technology column for The Judges’ Journal, a publication of the ABA Judicial Division.