How I Got GPT-4 to Predict the Future
This week I decided to try an experiment where GPT-4 would predict the future. I'll explain the setup, why the problem is difficult for normal prediction problems, the result of the experiment, and what it means for your future with AI.
The Setup
My college had nominated me for a university mentoring award. But the award ceremony was at the same time as one of my classes. I asked if I needed to be there, and an administrator said, "If you win, you definitely want to be there."
Great. Except how am I supposed to know whether I'm going to win or not?
This is the moment where Phineas would say, "I know what we're going to do today!"
I realized that the biographies for all of the nominees had already been posted online. There were 8 of us, and it seemed easy to ask GPT-4 to judge which of the nominees was favored to win. But there was one problem.
The award was not being decided based just off of the biography. There was a whole packet under consideration. The official items included:
The biography
A CV focused on activities related to the mentoring
A letter from the nominee's department chair
2-3 letters from students mentored by the nominee
I only had access to the first one. And it was the smallest piece of information in the whole set. Furthermore, I was ignoring tons of information on how long the professor had been on campus, whether they had a reputation that would make them the obvious choice, and so on. Could I really trust that GPT-4 could predict the winner?
That's when I thought of a way to test it. See, the biographies for all of last year's nominees were also on the site. But even better, I already knew who won. So I could ask GPT-4 to evaluate the biographies, rank all of the candidates, and then test to see how well the actual winner fit in the rankings.
The result was impressive: GPT-4 ranked the actual winner number one overall.
Now, I was impressed, but I still had my doubts. Because I had looked at these biographies when I was submitting my materials, and I remembered reading the winner's and thinking, "Well of course she won. Look at those accomplishments!" And you would think that all of the nominees had impressive achievements, but this is what the biographies felt like.
Dr. X has mentored the most students of anyone at the university. She received $1 million to start a lab purely focused on mentoring students, and it employs dozens of students every year.
Dr. Y is passionate about mentoring.
You don't need artificial intelligence to judge who won. Just regular old fashioned intelligence should do the job just fine.
The problem was that the biographies for this year's nominees looked like a lot of Dr. X with hardly any Dr. Y. Would GPT-4 be able to do it with stiffer competition?
I sent the biographies to GPT-4 and it sent me back the ranking. And it was not good.
See, I actually lied above. I didn't just put last year's biographies through GPT-4. I added mine in there too. I just wanted to see how I did. And while it still correctly chose last year's winner, it ranked me second. That's not a huge surprise because I modeled my biography off of last year's winner. But I think everyone else did too. Because this year I was ranked 5th. I didn't like my chances.
To make this experiment as valid as possible, I posted the rankings on Twitter before the award was announced.

Hard Problem
What makes this problem hard?
Just about everything in it. Forecasting in general is very hard. For example, back in COVID's prime it was incredibly difficult to predict how many COVID cases were coming just one week in advance. And forecasting GDP is notoriously unreliable.
Maybe the best demonstration of the difficulty of forecasting is March Madness. The men's college basketball season consists of about 30 games. Each team takes at least 50 shots per game. That's a lot of data to assess how good a team is. It seems like going into the March Madness tournament, we would have a good idea of which teams were the best and which were on the margin. Yet, every year there are massive upsets in the first round. Because forecasting is hard.
What makes it hard? It's the fundamental problem with the future. You're looking at a world that hasn't happened yet. Yes, the past has some patterns that can help you form expectations, but there are always things that haven't happened yet that can complete alter the outcome. For example, in August 2019, who could have predicted that a global pandemic would hit in six months and completely disrupt the world economy? And it doesn't have to be something so dramatic. What if one of the country's most dominant basketball teams loses a key player just before the NCAA tournament? That player was instrumental throughout the season, and we just don't have the data to forecast how his team will perform without him.
And COVID, GDP, and March Madness are cases where we actually have a decent amount of data and models to guide our analysis. I'm giving GPT-4 nothing here. All I gave it was a description of the award and the biographies.
So how did it do?
The Result
While the ceremony was long, fortunately the wait was not. This was the first award announced, and I sat anxiously awaiting one of two things. Either I would win the award, or GPT-4 would correctly predict the outcome. Either was a victory for me.
After reading the biographies to the audience, the presenter dramatically revealed the answer in the envelope. It did not go to me. But it did go to JK, the nominee GPT-4 ranked number one.
Holy crap.
The Takeaway
I'm still trying to process how it got that right. Something about JK's biography signaled the words associated with the award's criteria. But other than that, I'm baffled.
What did I learn from this?
One of the areas where young professionals can struggle is in presenting their accomplishments. I address this in my video on tips for writing a statement of purpose. Which of these is more convincing?
Dr. Palsson is a passionate teacher loved by students.
Dr. Palsson has taught thousands of students in his five years at the university and his development economics course is the most popular upper division elective in the economics department.
The second description is more powerful because it discusses specific details about teaching and gives the reader a concrete image of the accomplishment. But people avoid the second type of description and instead default to the bland first one.
If AI can predict who is going to win an award, it can help you pitch yourself. You could write your statement of purpose, then ask the AI to interview you about the experiences you mention. With the details it discovers, it could suggest how you could improve your essay to better sell your accomplishments. You could use this to refine your CV, your school applications, your job applications, even your Tinder profile!
Like I've said before, AI is going to significantly expand access to resources to people who had no chance of getting them before. This kind of stuff could be done by a career counselor, but how many people today can afford that? How many academic counselors familiar with applying to US colleges are in Ghana?
How many people are misallocated to low productivity jobs because they have no way of selling themselves to the jobs where they could produce the greatest value?
As AI expands access, it expands opportunity.