Personal health advice and Artificial Intelligence: an assessment of ChatGPT responses for common health conditions in the southern United States
Abstract
Artificial intelligence is here. Applications are already being made within health sciences education and healthcare. We assessed the responses to three common questions related to health problems that are often seen in the southern U.S.; hypercholesterolemia, diabetes, and hypertension. We asked ChatGPT what to do when a doctor told us we had these conditions, and had public health and nursing faculty, along with one anthropologist at two southern institutions rate the responses based on four items of interest: medical care bias, advice suggesting lifestyle changes, resources suggested, and reliability of those resources for self-care. Interrater agreement was assessed as well as the reading level of the responses. Generally, raters agreed the information was not medically biased, did give lifestyle change advice, did not suggest appropriate resources, and did not provide referenced sources of information. (Chronbach’s Alpha = .91) Interrater comparisons were also made. Among nurses, agreement was >.70, among public health personnel agreement was >.70. The anthropologist was more likely to respond in line with nurse ratings and is also a nurse. Reading level assessment of the output using Flesch-Kincaid was at the 11.8 grade reading level. Additional research should be performed on the trustworthiness of AI generated health advice. AI algorithms should take into account the reading level of average Americans.
Keywords
Nursing, Informatics, Artificial Intelligence, Health Promotion, Preventive Care
Personal health advice and Artificial Intelligence: an assessment of ChatGPT responses for common health conditions in the southern United States
Artificial intelligence is here. Applications are already being made within health sciences education and healthcare. We assessed the responses to three common questions related to health problems that are often seen in the southern U.S.; hypercholesterolemia, diabetes, and hypertension. We asked ChatGPT what to do when a doctor told us we had these conditions, and had public health and nursing faculty, along with one anthropologist at two southern institutions rate the responses based on four items of interest: medical care bias, advice suggesting lifestyle changes, resources suggested, and reliability of those resources for self-care. Interrater agreement was assessed as well as the reading level of the responses. Generally, raters agreed the information was not medically biased, did give lifestyle change advice, did not suggest appropriate resources, and did not provide referenced sources of information. (Chronbach’s Alpha = .91) Interrater comparisons were also made. Among nurses, agreement was >.70, among public health personnel agreement was >.70. The anthropologist was more likely to respond in line with nurse ratings and is also a nurse. Reading level assessment of the output using Flesch-Kincaid was at the 11.8 grade reading level. Additional research should be performed on the trustworthiness of AI generated health advice. AI algorithms should take into account the reading level of average Americans.