If you are like me, at some time during the working day you pull up ChatGPT. The kinds of things it can do seems to almost evolve hourly and with the landscape of artificial intelligence (AI) continuously evolving, each iteration of the Large Language Model (LLM) strives to bridge the gap between machine and human cognition. One such milestone in this journey is the advent of GPT-4 (Generative Pre-trained Transformer 4), which has showcased a notable advancement over its predecessor, GPT-3.5. The improvements span across various domains, significantly enhancing the model’s accuracy, context understanding, multitasking ability, robustness, and hallucination reduction.
In a revealing comparison, GPT-4 demonstrated a remarkable proficiency in predicting the outcomes of court cases, achieving an accuracy rate of approximately 88% as opposed to GPT-3.5’s 81% (EcoAGI, n.d.). This notable enhancement is attributed to GPT-4’s superior capability to analyze extensive legal documents and interpret intricate relationships between text and images in evidentiary material.
Furthermore, the medical domain witnessed the prowess of GPT-4 when evaluated on the Medical Final Exam (MFE). GPT-4 astonishingly passed all versions of the exam with a mean accuracy of 80.7%, outperforming GPT-3.5 which managed a mean accuracy of 56.6% in two out of three versions of the exam (medRxiv, n.d.). Despite this promising performance, GPT-4’s score trailed behind the average score of a medical student, underscoring the ongoing journey towards achieving human-like understanding in AI.
One of the significant breakthroughs of GPT-4 is its enhanced context understanding capability. Unlike GPT-3.5, limited to a mere 3,000 words, GPT-4 astoundingly processes around 25,000 words of context, paving the way for a deeper understanding and interpretation of input data (Digital Trends, n.d.). This larger context window is a giant stride towards handling longer conversations or documents seamlessly.
Moreover, GPT-4 heralds a new era of multitasking and robustness in few-shot settings, inching closer to human performance. It exhibits a lesser dependency on good prompting, making it more resilient to human-made errors, a trait indispensable for real-world applications (Towards Data Science, n.d.).
While hallucination, the generation of factually incorrect or nonsensical information, remains a challenge, GPT-4 has made commendable progress. It is reported to be 19% to 29% less likely to hallucinate compared to GPT-3.5, rendering its responses on platforms like ChatGPT noticeably more factual (MUO, n.d.).
The strides made by GPT-4 underscore the relentless pursuit towards refining AI to better mimic human cognition. Each improvement not only contributes to the model’s efficiency but also broadens the horizon for its application across diverse domains.
References:
EcoAGI. (n.d.). In-Depth Comparison: GPT-4 vs GPT-3.5. EcoAGI. Retrieved from ecoagi.ai
medRxiv. (n.d.). Evaluation of the performance of GPT-3.5 and GPT-4 on the Medical Final Exam. medRxiv. Retrieved from medrxiv.org
Digital Trends. (n.d.). GPT-4 vs. GPT-3.5: how much difference is there? Digital Trends. Retrieved from digitaltrends.com
Towards Data Science. (n.d.). 4 Things GPT-4 Will Improve From GPT-3. Towards Data Science. Retrieved from towardsdatascience.com
MUO. (n.d.). GPT-4 vs. GPT-3.5: 5 Key Differences Explained. MUO. Retrieved from makeuseof.com