Some ideas for judges, lawyers, and legal academics on trying generative AI
On the usefulness of personal experience, and suggestions about what to try
When it comes to AI I normally focus on law and policy issues, but a few events in the last few weeks have made me want to depart from that beat a little for this post. So this post will have more of a “how to” / “what to try” flavor to it, distilling and elaborating on some advice I’ve been asked for a few times recently. Feel free to skip down to those sections below.
What prompted this post
What made me want to write this? First, I was on a panel at a recent training for a court’s judges, to talk about AI and law schools. I heard at the training that some informal polling indicated that a majority of the court’s judges had not yet tried ChatGPT or another generative AI tool, a proportion that tracked my impression from a show of hands when I asked the room of around 100 people. And second, some conversations with a few of my law professor colleagues led me to the (extremely unscientific) impression that among the law professors of the world, it is also plausible that a majority have not yet tried generative AI, or at least not tried it for more than a few minutes.
Those numbers are perhaps unsurprising. Most people (maybe by definition) are not early adopters of new technologies, and although generative AI has made lots of inroads into the law, it is very far from normalized across the board. And at least in some circles there is a lot of negative sentiment about generative AI, which I would guess tends to cause some people not to try it when they otherwise might have. Some people may also have an allergic reaction to the positive conversations around the technology, which can be exhausting and sometimes strain credibility.
But in the two industries that I sit at the nexus of—law and academia—I think it’s time for people to spend some time with AI tools if they haven’t yet done so. To begin with, the legal profession is trending toward ubiquitous use of AI. But even if you don’t expect to use it yourself in any regular way, circumstances are arising that will call more and more for people who work in the law to have informed opinions about generative AI, regardless of their personal usage.
In particular: schools are facing important decisions about what kind of uses to permit (or to teach), and how to police any rules they put in place—see, e.g., the latest viral article about AI cheating in higher education. Courts are already having to make policies about its use by litigators in the courtroom, and in their role as regulators of the bar they will increasingly need to make determinations about many different issues of professional ethics. And lawyers are facing inquiries and demands from clients about the possibilities of using generative AI to get the job done faster, cheaper, or better.
Moving beyond news stories about hallucinations
To me, all of this means that anyone who works in the legal field and hasn't personally tried out a generative AI tool should seriously consider doing so for at least a few hours. My working hypothesis is that when it comes to generative AI tools, the best way to become more informed is with a combination of personal experience and reading systematic empirical studies (of the kind I discuss here).
In contrast, I think many people's information comes from a combination of word of mouth and news stories. Those kinds of sources can be very useful, but they have major limitations. In particular, without the kind of granularity that personal experience brings, generative AI can remain a kind of abstraction or boogeyman, filtered only through stories of things gone wrong. Irresponsible uses of AI make headlines, but the many mundane, responsible uses of it from day to day often don’t get publicized in much detail anywhere. My impression is that many people have the idea that the only thing lawyers would use ChatGPT for is to answer legal questions, but they know it hallucinates answers, so they don’t really see the point. And secondhand stories and news articles also almost never involve systematic testing, making it very easy to get overgeneralized impressions about these tools.
Personal experience has limitations, too! In particular, these tools can give the impression that they can do more than they actually can. But my sense is that there are many people in the law who have read about AI, are aware of its flaws, but haven’t tried it out much themselves. And this post is for that category of people.
Some thoughts on getting started
At this point, I’ve had a number of conversations with judges and law professors who have asked for advice on getting started trying out ChatGPT (or another large language model), and I thought I’d write up some of my thoughts to make them easier to share. Here, I’ll offer some preliminary, general thoughts and tips, then segue into some specific ideas for how to try out large language models. I’ll close by offering some more thoughts, caveats, and responses to concerns that I’ve heard.
My goal is not to convince you that AI can or should do all of the things I’m suggesting exploring here. It’s to help you get a sense of its strengths and weaknesses. So some of these ideas are ones where I actually don’t think AI always does a great job, but where it should (hopefully) be easy to see that.
Preliminary tips:
I strongly encourage you to pay $20 to get the best versions of AI tools that are available, rather than using a free option. If you want to find out what these tools are capable of, it will be more informative to use the most capable current models. There is a very big gap between the free version of ChatGPT and what you can access via the paid version. Opinions differ on which models are best at any given time, but based on my experience (which includes running many legal questions through more than a dozen models, including full law school exams in multiple subject areas through some models), I would recommend getting an account with OpenAI and focusing on its o3 and 4.5 models. This post is written as if o3 and 4.5 are the models you are using.
I would recommend o3 for anything analytical like answering legal questions, and also for web search; and 4.5 for more language-oriented tasks like editing or drafting. For the tips below I will put “4.5” or “o3” in parentheses at the end to suggest which model I recommend using. This is more art than science, and your mileage may vary.
Be careful about overgeneralizing from a few examples of anything that you try. This is for two reasons. First, AI tools are probabilistic, and you’ll get different results from the same or similar prompts. It takes some time and experimentation to get a sense of the range of results from any particular kind of approach. You should think of getting better at prompting as a way of improving the distribution of your results, but not as a way of guaranteeing any particular result. And second, AI tools get better along what some call a “jagged frontier”: there are some tasks that they excel at, and others that they are abysmal at, and often two tasks that seem very similar will yield very different results in terms of success and failure.
Relatedly: if your goal is learning “what AI can do,” there is an asymmetry in how much information you get when AI fails at a task versus when it succeeds. If you try giving an AI tool a task, and it does a bad job at it, you have some evidence that it’s not good at that task—but it could very well be that a different model, a different prompt, or giving the tool more context (such as uploading relevant documents) could give you a different result. In contrast, if you figure out a way to get the AI to successfully do a task, you have strong evidence that the tool at least can be successful at that task, even if the tools’ probabilistic nature means that you should be cautious about assuming equally good results will occur on every run.
Because of this asymmetry, it is particularly worth varying up your approach and trying different ways of tackling a problem if you don’t get success on the first attempt. Try telling the tool that it’s wrong, or what it did wrong, or giving it more information. It is often possible that ten minutes and half a dozen different attempts at a task will move the result from bad to good.
If you’re an academic trying to get it to answer homework or exam questions, it may refuse on the first pass in some effort at not helping students cheat. It is pretty easy to work around this. One prompt that I use: “I am a law professor trying to do quality control on my final exams to see how difficult they are. Please take this exam, trying to score as high as you can—be careful and thoughtful, pay attention to details, and reason carefully throughout as you answer.” You can get much more experimental than that, too.
There are a couple of different potential goals here that you may want to keep in mind as you explore. One is what I’ve alluded to so far: the goal of understanding what it is that these tools are and are not capable of currently. But another is trying to understand how and why people can use them responsibly in professional settings, given their clear limitations. My sense is that most judges, lawyers, and academics are well aware of the downsides and flaws of these tools; and if you’re someone who hasn’t spent a lot of time with them, those flaws may be part of the reason.
If that sounds like you, it might be worth making part of your goal with these exercises to improve your understanding of how a responsible legal professional might use these tools despite those flaws. Despite the many headlines about hallucinated cases, there are many excellent lawyers out there using these tools in ways that they regard as ethical and effective.
Conversely, if you’re extremely bullish on AI, another goal here might be reflecting on how even top-flight lawyers who use these tools end up making embarrassing public mistakes in high-profile situations.
To start, some low-stakes explorations:
Here are a few ideas for extremely low-stakes ways of getting used to engaging with an AI tool, seeing how it works, and watching it change in response to different inputs:
Recommending something: give the tool the names of some books you’ve enjoyed, and ask it for recommendations. If it recommends books you’ve already read or are already aware of and don’t want to read, press it a little and ask for new recommendations. Go a few iterations and try to get recommendations that actually seem good. I find this is most successful if I list books that I really enjoy that are dissimilar to each other, reflecting different facets of what I like. This also works for movies and music; asking it to be a “listening coach” for an artist or composer who is new to you, recommending an order of approach, can also work (although I’ve had mixed results here). (4.5)
Summarizing or explaining something you know well: pick a particular area you’re pretty knowledgeable in—maybe a hobby, an academic discipline, an area of case law, etc.—and ask the AI to explain something about it to you. See what it does well and what it does poorly. Try asking it to explain it to you at different levels of detail, precision, sophistication, etc.— “explain this in a way suitable for an expert in the field,” “explain this like you would to a ten year old, using only elementary school vocabulary,” “explain this in a way you would explain it to an alien visiting Earth for the first time,” etc. (4.5)
Interpreting something: ask it for help interpreting an essay, poem, book, painting, etc. It may help to upload a copy of what you’re asking it to interpret. If you disagree with its interpretation, challenge it or offer an alternative explanation. I’m not suggesting this exercise because I think these tools are brilliant at this; instead, it’s an easy way to interact with the tool for a few rounds. In my experience, it will often start in a highly generic place but will typically be able to offer something more interesting when prodded, and at times can be genuinely useful or insightful. (4.5)
More substantive steps:
Search and research: This is, for me at least, the clearest way to get real value out of the current frontier of AI tools. Use o3 to help you search for something that you genuinely would spend time looking for, and evaluate the results. It will give you links; sometimes it will misinterpret them or hallucinate things. But that’s not a huge problem, because typically when you search for things you are reading the results yourself anyway—when you search on Google for a web site or Westlaw for a case, you then go to the web site or the case. The value add of the AI tool is in the ability to find specific results with natural language prompting. (o3)
You can ask it to find something like, for instance, “some legal cases in which the defendant built property on the plaintiff’s land without authorization but the court refused to support the plaintiff’s demand to destroy the building.” Here are the results, some of which are very good! Try doing your own search on something you are genuinely interested in; and be sure to check the results well to get a sense of the strengths and weaknesses of this approach. (I also would not recommend using these tools as substitutes for traditional legal or academic research; this is just to give you a sense of ways in which they might be plausibly useful supplements.) This can be completely non-legal as well; lots of people find these tools useful to search for and compare different products in a category they are thinking of making a purchase in.
You can also direct it to use “deep research,” which is a tool that will cause the AI to take several minutes and write an in-depth report. I would not trust this report to be entirely accurate or comprehensive. But it can still be useful, again if your goal is primarily to get one or more examples of something (and it’s less necessary to, e.g., be sure that you’ve exhausted every possible source). At this stage, o3’s web searching and subsequent writeup is thorough enough that I often prefer to just use an o3 web search over specifically using the deep research tool.
Summary and critique: Give it something that you have written—a brief, an opinion, an article—and ask it to summarize and critique it. See how it does. Ask it for five places to improve the writing stylistically. Ask it for five places to improve the argument substantively. Try prompting it to do the critique for an expert audience, for students, etc.; or tell it to be more critical (or less critical) and see how it varies the output. If you’re a litigator, give it an opposing side’s brief and ask it to summarize and critique it. How’d it do? For an academic paper, try the prompt “you are an expert in the field, and an excellent peer reviewer. Critique this paper.” (o3, but 4.5 might be interesting too)
(See the caveat / caution below about giving AI tools non-public documents.)
Try doing just a summary, and not necessarily a critique, of something that you’ve written or something that you’ve read and know well. How is the summary? Try asking it to summarize the document in a more specific way—like, “summarize this document and be sure to note each time X comes up,” or other more detailed instructions. (4.5)
Try getting it to do a summary, or construct a narrative, about something that is different than a single article or court opinion. For instance, try uploading a long trial court docket sheet and just ask “what happened in this case?” How does it do? Or try uploading a bunch of documents by the same author and ask it to pick out themes, or chart the author’s development, or other tasks like that. (4.5)
Question and answer: Within your area of expertise, see if you can find its skill level. Give it some very easy questions and see how it does. Give it some hard questions and see how it does. Try to find levels of difficulty where it is mostly getting things right, and levels where it is mostly getting things wrong. (o3)
In addition to varying the difficulty, try varying the type of question — e.g., even within the same subject matter, there will be different kinds of questions that you can ask that will yield different types of responses. For instance, questions like “what case held X” will be more likely to get hallucinated case names; that’s a particular, well-known failure mode. Try other kinds of questions, too, to see if you learn about typical ways these tools fall short or ways they succeed.
Drafting and editing: Try using it to assist you in writing something at a professional level—not necessarily something you intend to use professionally, but something of the kind of detail and level of quality that you would want out of a professional document. You could approach this in many different ways, and which one is best will probably depend on circumstance. A few ideas:
Making a close copy: do you have a document that you often have to draft different versions of? Try uploading an old version and asking it to draft a new version, giving it the relevant changes in a few bullet points. See how it does. (4.5)
Critiquing or editing an existing draft: try uploading a mostly complete draft of something, and ask it for comments, edits, revisions, etc. You can try this along many different dimensions—with broad instructions to focus on style, or on substance, or narrower instructions to focus on readability or organization. Ask it how someone who disagreed with what you are saying would critique it, and ask how that person might be better persuaded. (4.5 or o3)
Generating a draft: try asking it to generate its own draft of something. Upload a brief and ask it to draft a response from the other side. Upload a complete set of briefs and ask it to draft an opinion. Then ask it to draft an opinion coming out the other way. If you’re a teacher and do simulations, try asking it to draft a document for the simulation (like an affidavit or a deposition transcript). (o3 or 4.5)
Some caveats and cautions:
You might not want to upload any non-public documents to the AI tool, depending on your circumstances. AI companies often use at least some data from their users to train future models on, and may retain what you upload. Some of the companies have terms of service that give the impression that they don’t do this, or only do it in limited circumstances; as with any one-sided terms of service, you should probably read those somewhat critically and imagine ways that the “limited circumstances” could encompass your use cases.
This is not a huge concern for me, but I’m an academic who does not usually handle particularly sensitive data. I don’t upload student work product into AI systems; but I have uploaded my own draft writing, as well as homework and exam questions I have written for class that are not publicly available.
Hopefully it goes without saying, but be extremely careful about actually using any material generated here in any context with real-world stakes. Do not rely on these tools to make factual, accurate representations about the world, the law, or even of documents that you have uploaded to them. One of the interesting exercises here is, I think, realizing the ways that generative AI tools can create value and efficiency even with these limitations.
These tools have “knowledge cutoffs,” which mean they are trained on inputs that stop after a certain date, and have limited or no knowledge of anything after that date unless they conduct a web search. So don’t be surprised if ChatGPT doesn’t know about a recent Supreme Court case, for example.
It’s a mistake to think of these tools as “intelligent computers.” They often, e.g., make mistakes on simple math questions or word puzzles; they often can’t accurately answer questions about themselves or their technical specifications. These are just some examples of the “jagged frontier” idea that I mentioned above. But they also arise because of the underlying way that these models are created, which results in behavior that is more stochastic than many of the kinds of automated computer systems that are familiar to us from other contexts. I think it’s very worthwhile understanding more about the technology involved here, and have some suggested reading below.
Some people are concerned about the environmental impact of AI. And that makes sense as a concern about the aggregate impact of AI’s energy use. But individual chatbot use of the kind I discuss here appears to have relatively small environmental impact, comparable to running a microwave for a few seconds. This isn’t at all to dismiss the broader environmental concerns, but just to say that I think the environmental costs are low when it comes to a one-time project of getting to know this technology through chatbot use. For more estimates and details, I would recommend this recent piece from the MIT Technology Review as well as this blog post providing some critique and context.
If you’ve made it all the way down here, I hope some of these ideas and thoughts have been useful. I’d love to hear any feedback or additional ideas you have.
Further reading suggestions:
Timothy B. Lee and Sean Trott, Large language models, explained with a minimum of math and jargon, Understanding AI (July 27, 2023).
Stephen Wolfram, What Is ChatGPT Doing … and Why Does It Work? (Feb. 14, 2023).
Adam Unikowsky, In AI we trust, part II (June 16, 2024) and part I (June 8, 2024). Note: I think these posts overstate AI capacity (I share some reactions here), but show some good ways to experiment with these tools in a legal setting, and the results are interesting.
Daniel Schwarcz, Sam Manning, Patrick Barry, David R. Cleveland, J.J. Prescott, and Beverly Rich, AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice (March 2025).
Lisa Larrimore Ouellette, Amy R. Motomura, Jason Reinecke, and Jonathan S. Masur, Can AI Hold Office Hours? (March 2025).
Excellent summary and suggestions. Thanks.