Advertisement

ChatGPT in Scientific Writing: A Cautionary Tale

      ChatGPT (OpenAI, San Francisco CA), an artificial intelligence chatbot, has garnered worldwide excitement since its release in November 2022.

      Open AI ChatGPT 2022. Accessed Feburary 12, 2023, https://openai.com/blog/chatgpt/.

      Despite its growing popularity, concerns have also been raised about its impact on scientific writing and publishing. ChatGPT automatically creates text based on written prompts from users and is capable to generate various forms of writing, including essays, poems, lyrics for the song, and even an academic essay. Reports have emerged of articles in journals that were written by using ChatGPT,
      • Stokel-Walker C.
      ChatGPT listed as author on research papers: many scientists disapprove.
      ,
      • Biswas S.
      ChatGPT and the Future of Medical Writing.
      including a recent article in the highly-regarded journal Radiology, which was entirely written by ChatGPT with a radiologist in training listed as the corresponding author.
      • Biswas S.
      ChatGPT and the Future of Medical Writing.
      The article's headings and subheadings were used as prompts by the actual author to generate all the contents for the article. The increasing use of ChatGPT in scientific writing poses an unprecedented and immediate challenge to scientific publishing, as there are currently no guidelines or regulations in place to guide or govern its use. Major concerns are focused on copyright, attribution, plagiarism, and authorship for the articles generated by ChatGPT. There has been limited discussion to date regarding the accuracy of the content generated by ChatGPT.

      Liebrenz M, Schleifer R, Buadze A, Bhugra D, Smith A. Generating scholarly content with ChatGPT: ethical challenges for medical publishing. The Lancet Digital Health 2023.

      To evaluate the output of writing generated by ChatGPT, we used a recent article titled “it is time to abandon the use of body surface area (BSA) derived from a 100-year-old formula”, which was written by our first author and published in 2022.
      • Zheng H.
      It Is Time to Abandon the Use of Body Surface Area Derived From a 100-Year-Old Formula.
      This article should not be in ChatGPT's database, as its training data only goes up until 2021.

      Open AI ChatGPT 2022. Accessed Feburary 12, 2023, https://openai.com/blog/chatgpt/.

      We first summarize key facts from the article and then repeatedly prompted ChatGPT with a question to assess its response.
      The following are basic facts from the article:
      • 1.
        In 1916, Du Bois Du Bois introduced the first BSA formula. BSA(m2) = Weight(kg)0.425  × height (cm)0.725  × 0.007184. The article was published in the Archives of Internal Medicine.

        Liebrenz M, Schleifer R, Buadze A, Bhugra D, Smith A. Generating scholarly content with ChatGPT: ethical challenges for medical publishing. The Lancet Digital Health 2023.

        The formula was derived from 9 individuals.
        • Shuter B
        • Aslani A.
        Body surface area: Du Bois and Du Bois revisited.
        The formula has been used to calculate the BSA based on weight and height.
      • 2.
        In 1987, Dr. Mosteller introduced a simplified formula for BSA calculation.
        • Mosteller RD.
        Simplified calculation of body-surface area.
        BSA (m²) = ([Height(cm) x Weight(kg)]/ 3600)½. The article was published in the New England Journal of Medicine. The formula has been widely used due to its simplicity.
      The facts were re-examined by prompting an open-ended question to ChatGPT: “How was the Du Bois formula generated?” The same question was prompted 5 times to regenerate responses. The output of ChatGPT was evaluated. Excerpts from each of the five responses along with our comments are presented below in order. For the full texts and screenshots from ChatGPT, please refer to the supplemental file.
      • 1.
        Excerpt one: “Du Bois collected height and weight data from over 600 individuals, which he used to develop a mathematical equation to estimate BSA”
      • Comment: This is a clear falsification. The Du Bois formula was actually derived from data collected from only 9 individuals.
      • 1.
        Excerpt two: “The Du Bois formula uses the following equation to calculate BSA: BSA= 0.20247 x height0.725 x weight0.425”.
      • Comment: The formula is incorrect. In fact, only one of three responses generated by ChatGPT related to formula was correct.
      • 1.
        Excerpt three: “Du Bois was a sociologist and a pioneering figure in the field of African American studies. He was interested in understanding the social, economic, and political conditions that affected the health of African Americans and saw the importance of accurate measurements of body surface area in this regard.”
      • Comments: This is a fabricated tale-like statement generated by ChatGPT. Moerover, Du Bois's profession varied among physician, researcher, and sociologist in the different responses.
      • 1.
        Excerpt four: “The Du Bois formula was generated by collecting height and weight data from a sample of healthy individuals and using statistical methods to determine the relationship between the two variables.”
      • Comment: Statistical methods were not used by Du Bois at that time.
        • Shuter B
        • Aslani A.
        Body surface area: Du Bois and Du Bois revisited.
      • 1.
        Excerpt five: “The formula was created as a result of Dr. Du Bois's interest in improving the accuracy of BSA calculations. He believed that the commonly used Mosteller formula, which was based solely on height and weight, was inaccurate in certain populations, particularly those who were overweight or obese.”
      • Comment: The Mosteller formula was introduced in 1987, 71 years after Du Bois formula.
      All the five responses produced by ChatGPT are well-written and plausible-sounding. However, each response contains information that is either fundamentally wrong or fabricated. These falsifications (distorting data or findings) and fabrications (inventing data or cases) are not easily discernable to readers or inexperienced reviewers. ChatGPT simply extracts relevant data through literature searches, processes them, then creates its own story without considering the logic or accuracy of the story. This clearly indicates that the current version of ChatGPT is not ready to be used as a trusted source of information for scientific writing. Scientific writers who rely on ChatGPT must manually check all the “facts”, statements, and references generated by ChatGPT. Therefore, there is no obvious advantage to writing with ChatGPT.
      Scientific research is a human endeavor to uncover the truth and must be conducted in accordance with the highest ethical standards. The publication with incorrect data diverts the research from the truth and spreads misinformation. Therefore, careful data management is crucial as it directly reflects the integrity of researchers. Poor data management is scientific misconduct and should be avoided at all costs. Fabricating or falsifying findings is a serious scientific fraud and a violation of ethical standards in scientific research. Such misconducts can lead to sanctions against researchers and can even result in the end of their academic careers. For a ChatGPT-generated article, all listed authors are directly responsible for the accuracy and integrity for the output and are accountable for any misconduct. ChatGPT cannot be held accountable for any misconduct.
      As demonstrated by our example, identifying fabrication or falsification during the peer-review process of a manuscript containing text generated by ChatGPT will pose a significant challenge for reviewers and editors. This challenge becomes even more worrisome if the actual authors of paper are not responsible for careful fact-checking. A recent study revealed that human reviewers could miss up to 32% of abstracts that were fully fabricated by ChatGPT during a consciously screening process.

      Gao CA, Howard FM, Markov NS, et al. Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. bioRxiv 2022:2022.12. 23.521610.

      Scientific journals that accept ChatGPT's involvement in writing will face a significant increase in the number of retractions of published articles and a loss of credibility for the journals. Moreover, directly adopting the full text written by ChatGPT may constitute plagiarism and violate the code of conduct for scientific publishing, as originality is the foundation of scientific writing. Work that relies solely on ChatGPT's outputs lacks critical thinking and reasoning skills of a human being, and can potentially be detrimental to the research and impede the science advancement.
      In conclusion, while ChatGPT may hold endless possibilities for the future, its current form is far from mature to handle scientific writing. The future role of a more advanced ChatGPT in scientific writing will require comprehensive discussions and debates. The Science family of journals recently banned all ChatGPT-generated text, figures, images, or graphics. A violation of these policies will constitute scientific misconduct no different from altered images or plagiarism of existing works.
      • Thorp HH.
      ChatGPT is fun, but not an author.
      Given the potential implications of fabricated and inaccurate information generated by ChatGPT, such a policy should become standard practice for all scientific publishing. Nonetheless, ChatGPT can still be a useful tool for checking grammar and syntax errors and refining the language, particularly for non-native speakers.

      Appendix. SUPPLEMENTARY DATA

      References

      1. Open AI ChatGPT 2022. Accessed Feburary 12, 2023, https://openai.com/blog/chatgpt/.

        • Stokel-Walker C.
        ChatGPT listed as author on research papers: many scientists disapprove.
        Nature. 2023; 613: 620-621
        • Biswas S.
        ChatGPT and the Future of Medical Writing.
        Radiology. Published online: Feburary. 2023; 6
      2. Liebrenz M, Schleifer R, Buadze A, Bhugra D, Smith A. Generating scholarly content with ChatGPT: ethical challenges for medical publishing. The Lancet Digital Health 2023.

        • Zheng H.
        It Is Time to Abandon the Use of Body Surface Area Derived From a 100-Year-Old Formula.
        Am J Med. 2022; 135: e308-ee10
        • Shuter B
        • Aslani A.
        Body surface area: Du Bois and Du Bois revisited.
        Eur J Appl Physiol. 2000; 82: 250-254
        • Mosteller RD.
        Simplified calculation of body-surface area.
        N Engl J Med. 1987; 317: 1098
      3. Gao CA, Howard FM, Markov NS, et al. Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. bioRxiv 2022:2022.12. 23.521610.

        • Thorp HH.
        ChatGPT is fun, but not an author.
        Science. 2023; 379 (-.): 313