AI Powered Healthcare Application

Many people are raving about AI. Hassan Farooq describes how he used it in a project so you can learn how to build an AI model.

Problem

In the healthcare sector, there is a challenge in delivering healthcare advice that is accessible and personalised to patients’ needs in a timely manner, especially for individuals who live in areas with limited medical services. This can lead people to rely on online sources which may be unreliable. As a result, they may feel anxious because they don’t know if the information is correct and if they follow incorrect advice, their health may get worse.

Meanwhile, healthcare professionals are overwhelmed with high workloads and delivering consultations which are not urgent which could be handled by digital tools. This puts additional strain on healthcare systems, shifting critical resources from urgent care needs.

Therefore, there is an increasing need for intelligent systems that provide users with personalised advice. The key challenge lies in developing a solution that integrates generative AI to offer accurate, timely and user-friendly recommendations while adhering to medical standards.

Implementation

This report documents the development of an AI-Powered Healthcare web application as part of my final year project at University of Bradford for the Software Engineering BEng Hons. I had around eight months to complete this project alongside my modules and with limited resources. Also, I utilised only the tools and technologies within my capacity, without any external support.

I developed an AI-Powered Healthcare web application that integrates Generative AI to provide users with personalised medical advice and support.

To interact with the chatbot, users must securely register and log in. I built the authentication system using JWT authentication [JWT] and role-based access control.

When using the chatbot for the first time, users are required to read and accept a disclaimer. This disclaimer outlines the chatbot’s purpose and includes consent for the use of personal data. See Figure 1.

Figure 1

The chatbot initially asks for the user’s age and gender. If the input is invalid or irrelevant, it handles the response gracefully with appropriate prompts. Once this information is received, the chatbot then asks about the user’s health concern. If the input provided is too vague, it will ask the user to provide more detail.

To handle vague inputs, I implemented a vague input detection system. This works by comparing the user’s input to a set of predefined vague phrases using the SentenceTransformer model [SBERT]. Both the user input and the vague phrases are converted into vector embeddings and the cosine similarity between them is calculated. If the similarity score exceeds a predefined threshold, the input is considered vague and the chatbot prompts the user to elaborate. See Figure 2.

Figure 2

Once a clear health concern is provided, the chatbot sends this information including the age, gender and health issue to the Mistral AI language model [Mistral]. To improve the accuracy and relevance of the responses, I enhanced the model using Retrieval-Augmented Generation (RAG), which incorporates official NHS health articles which were stored as vector embeddings in the Pinecone database [Pinecone]. RAG was the best technique to apply because it allowed me to use authoritative NHS articles as a factual base, improving trust and accuracy. Implementing it involved web scraping and cleaning NHS pages, converting the content to embeddings using SentenceTransformers, storing them in Pinecone and querying them based on the user’s input before sending to the language model.

I applied prompt engineering to the language model to set the context to healthcare and so it considers the user’s age and gender when generating health advice. As a result, the model generates responses in a conversational tone while referencing information from the NHS database and it also includes links to the original NHS sources. This promotes user trust, transparency and allows them to explore their health issues further on the NHS website. See Figure 3.

Figure 3

The chatbot features a dynamic input button enabling users to interact via voice recognition or typed text, powered by Google’s Speech Recognition API [Google].

Users can also review previous conversations as well as start new chats or delete old ones, giving them full control over their chat history.

Additionally, I implemented a Health Assessment feature on another page. (Figure 4). Here, users answer a series of yes/no health-related questions and at the end, their responses are sent to the language model. The model then provides general health advice and recommendations, allowing users to benefit from the chatbot’s support even if they don’t have a specific health issue.

Figure 4

Given the sensitive nature of the application, data protection issues were key priorities throughout the development of this application.

User Consent – Before interacting with the chatbot, users must accept a disclaimer and consent to the use of their data. This step ensures informed participation and transparency.
Data Minimisation – Only minimal data is collected such as age, gender and the health concern.
Secure Storage – User credentials like password are hashed using industry-standard hashing algorithms such as bcrypt.
Chat history is stored in a secured MySQL database where only the user has full control over their previous chats. They can view, delete or start new conversations, this shows privacy of their health interactions.
Role-based access control ensures that only authorised users like Admins can manage sensitive user data.

Ultimately, the tech stack includes React (front end), Java (back end), Python (for the chatbot and RAG logic), MySQL for data storage and Pinecone for storing NHS data as vector embeddings.

Chatbot testing

Testing was conducted to assess the effectiveness of my chatbots response to real answers. I took 10 random questions from the MedQuad Dataset. MedQuAD is an abbreviation for Medical Question Answering Dataset and it consists of pairs of questions and answers which are curated from 12 trusted National Institutes of Health (NIH) websites.

I have made a testing file called evaluate_chatbot.py where I send my predefined medical MedQuAD to my chatbot. The response is then collected from my chatbot and compared to GPT 4’s (ChatGPT) [GPT] answers to see which is more semantically similar to the expected answer. See Figure 5.

Figure 5

Comparison is done using BERT-based sentence embeddings (all-MiniLM-L6-v2) to calculate cosine similarity. It’s basically like an Automated Semantic Evaluation and BERT similarity mainly checks if my chatbot conveys the same idea which is critical in medical and conversational AI tasks.

Results and discussion

Based on the results of the test, ChatGPT, which is state of the art, generated long and nuanced answers, whereas my model which is an open source using the free tier, generated responses that were truncated. Nevertheless, the results demonstrate that my chatbot with the support of RAG and prompt engineering is quite competitive with ChatGPT, even with its shorter answers. For a couple of questions, my chatbot nailed it but also felt short for some.

By integrating Retrieval Augmented Generation (RAG) into my chatbot and using NHS articles as the knowledge base, the chatbot retrieves factual information directly from NHS resources before responding to users. This ensures that responses are grounded in reliable data which reduces the risk of the chatbot hallucinating (generating incorrect answers) which is particularly critical in the healthcare domain. For example, if a user’s health concern is expressed as, “I have a headache due to dehydration,” the model would generate a response that includes relevant NHS articles, such as https://www.nhs.uk/symptoms/headaches/ along with other related resources.

Although the NHS articles are static, the chatbot interacts with them to craft responses in a conversational tone by tailoring the information to suit the user’s queries. Additionally, the chatbot can provide relevant links to NHS articles which allows users to explore and read more about their health concerns if they wish. Ultimately, the responses generated by the chatbot are backed by trusted sources, which may enhance user trust, satisfaction and it would allow the chatbot to generate better responses compared to the model itself.

On the other hand, the limitation of the chatbot is that the accuracy of its advice depends heavily on the quality of user input. If users fail to clarify their condition or provide enough details, the chatbot’s responses may lack precision. For example, a user inputting difficulty in breathing must indicate whether they have been diagnosed with asthma or how long the pain has been and explain the pain, so the chatbot can tailor its advice accordingly.

Additionally, the Mistral model was selected as a feasible language model due to its availability in a free version, but this comes with limitations because once the free tier has exceeded, payment will be required.

Furthermore, there were 170 NHS articles used in my chatbot which means that my chatbot may not fully have evidence or reliable source for every single healthcare scenario. A greater number of NHS and other reliable sources would allow the chatbot to cover more healthcare scenarios and maybe improve the results of the chatbot test.

Challenges

My initial plan was to use a language model and fine tune it with datasets related to healthcare. However, most of the models I came across had a large number of parameters, required payment and demanded high memory to run locally. This made it challenging to find a language model that was both suitable and feasible to run on a laptop with 8GB of RAM.

Eventually, I came across the Mistral AI language model which was suitable. My next step was to fine-tune it but I discovered that that this process requires a lot of computational power. Additionally, fine-tuning is more research oriented and the time required for this task would have exceeded the timeframe I had for the project.

One of the biggest challenges I faced was figuring out a way to validate the AI’s responses instead of simply relying on a third party language model to generate answers. That’s when I discovered Retrieval-Augmented Generation (RAG). After researching it and assessing its suitability, I decided to implement it.

To test the accuracy of the chatbot’s advice, I was limited to using 10 questions from the MedQuad dataset. Adding more questions would have exceeded the free tier limit of the language model.

Whilst searching for a reliable dataset, I discovered that RAG allows you to store URLs in its knowledge base after cleaning, extracting the content and converting it into embeddings. Therefore, I chose to use NHS URLs as the source for the RAG knowledge base.

Future work

For future work, more state-of-the-art tools and models such as the latest GPT model or DeepSeek [DeepSeek] could be integrated. This process would be straightforward, involving a simple pick and plug approach to enhance the system’s capabilities. The dataset could be expanded to include more NHS articles or other reliable resources could improve the system’s coverage of healthcare scenarios.

Additionally, prompt tuning could replace prompt engineering to set the healthcare context of the language model more effectively. By learning the task’s prefix rather than relying on handcrafted prompts, this approach has the potential to achieve higher accuracy.

For performance improvement, the model could be running locally on a GPU if feasible, this would allow the response to be a lot faster. In future, expert human evaluation would help assess the medical validity of the chatbot’s responses.

Ultimately, my end goal is to have this project deployed on a cloud platform. It will always be open for improvements with additional features in my own time.

If the project were to be a commercial product, medical device approval will be required, GDPR compliance, clinical safety validation and transparency in AI usage. This shows that the process can be complex as it requires consultation with legal, clinical and regulatory experts.

Technology overview

Retrieval-Augmented Generation (RAG): RAG is a generative AI technique which allows you to modify language models with external data.

Prompt Engineering: Prompt engineering is used to set the context and guide language models so they can understand the question and give an appropriate response.

Pinecone: A vector database used for storing embeddings (numerical representations of text) so the system can retrieve the most relevant health information.

Cosine Similarity: A mathematical way used to compare the similarity between two vectors (text embeddings) by calculating the cosine of the angle between them. It helps determine how similar two pieces of text are in terms of meaning regardless of their magnitude.

SentenceTransformer: SentenceTransformer is a library (based on models like BERT) that converts sentences or texts into dense vector embeddings. These embeddings can then be used for tasks like semantic search, clustering, and similarity comparison.

Vector Embeddings: These are numerical representations of text, where semantically similar texts have similar vectors.

State of the art AI health chatbots

There are many different online solutions out there that are designed to allow patients to manage their health, give them quick and useful advice after inputting relevant information which they could gain without booking an appointment.

The Ada Health app [Ada24] was launched in 2011 by a global company founded by medical experts, it has an AI system which aims to make healthcare easier and more effective for users allowing them to manage their health independently. It uses artificial intelligence to help diagnose symptoms [Singh21]. You input your symptoms, and you are required to answer some question from the chatbot. The app then analyses your input and answers to provide possible diagnoses and advice.

The app compares your symptoms with medical dictionaries on which it has been trained and based on that, the app is able to generate a personalised report. The app may be able to also assist you with different symptoms such as anxiety, pain, allergies, headache and many more. It’s a useful tool for getting quick view of your health problems.

The NHS also has a chatbot known as the Limbic Access chatbot and it’s used to streamline the mental health referral process within the NHS [NHS22]. It helps services like Mind Matters Surrey NHS (IAPT) by acting as a digital front door for patients seeking mental health support.

Instead of calling or filling out a long form, you just chat with the bot online. When someone wants help, the chatbot asks them friendly, step-by-step questions about how they’re feeling. It collects important details, like symptoms or if they’re at risk and sends that information to the NHS team. This helps staff save time because they don’t have to ask those questions again. Ultimately, it makes the process of asking for mental health help quicker, easier and less stressful for both patients and staff.

References

[Ada24] Ada: https://ada.com/about/

[DeepSeek] DeepSeek: https://www.deepseek.com/

[Google] Speech Recognition: https://pypi.org/project/SpeechRecognition/

[GPT] Generative Pre-trained Transformer: https://openai.com/index/introducing-gpt-4-5/

[JWT] ‘Introduction to JSON Web Tokens: https://jwt.io/introduction

[Mistral] Mistral AI: https://mistral.ai

[NHS22] NHS Transformation Directorate (2022), ‘Using an AI chatbot to streamline mental health referrals’, available at: https://transform.england.nhs.uk/key-tools-and-info/digital-playbooks/workforce-digital-playbook/using-an-ai-chatbot-to-streamline-mental-health-referrals/

[Pinecone] Pinecone Vector Database: https://www.pinecone.io

[SBERT] SentenceTransformers: https://www.sbert.net/

[Singh21] Singh, V., (2021) ‘Benzinga: Artificial intelligence doctor app Ada Health closes $90M funding led by Bayer, Samsung’, available at https://www.benzinga.com/m-a/21/05/21322232/artificial-intelligence-doctor-app-ada-health-closes-90m-funding-led-by-bayer-samsung

Hassan Farooq is is a Software Engineering BEng (Hons) graduate from the University of Bradford. He is passionate about building intelligent systems that solve real-world problems and to drive digital transformation and help businesses innovate. His interests include AI, programming languages, cloud computing, healthcare technology and the ethics of software development.