AI Tools Notes & Disclaimers

Guiding principles

All our apps are created in line with our AI policy and principles.

  • People first: Use gen AI tools to support human creativity and judgement, not to replace them. Empower human team members to detect and sift out inaccuracy, bias and sludge. Apply human judgment to ensure that principles of data equity are upheld when processing data using AI.

  • Data privacy: Ensure that our use of gen AI applications aligns with LogicalOutcomes’ data privacy and security procedures. This involves taking measures to ensure that sensitive and identifying information is stewarded in secure environments and client data is never used for training purposes. When developing client-facing apps, any use of user data for process improvement must be clearly signposted, and user permission sought. 

  • Transparency: Clearly document our use of gen AI and disclose this use in evaluation and research reports. Ensure that partners and clients are aware of how we’re leveraging AI to improve our work. Be open in discussing risks associated with AI as well as its benefits and inform people about the practices we follow to mitigate risk.   

  • Responsible development: Apply the Principles for Digital Development when developing and deploying gen AI tools. Work with users to co-design tools that meet requirements around functionality, security, sustainability, accessibility, and compatibility with existing technology. Resist the lure of magical thinking about AI.   

  • Learn and improve: Stay on top of developments in AI technology. Try new things. Gather and share feedback on the experience of using gen AI tools and the outcomes they produce. Channel insights towards improvement.  

How we created our AI apps

We developed our AI tools using Dify, an open-source platform for creating AI applications. These tools follow structured workflows with a series of prompts to break down tasks into sub-tasks, improving accuracy and usability. Instead of a single broad prompt like "Create a logic model," we decompose each step to enhance the quality of the final output. 

We chose Dify because it allows flexibility in working with different Large Language Models (LLMs) as they improve, while maintaining a systematic approach to complex tasks. Our implementation uses OpenAI and Gemini APIs, ensuring that we leverage state-of-the-art AI models while benefiting from Dify’s low-code AI workflow capabilities.  

Security and confidentiality

When using these apps, your data privacy and security are top priorities. The OpenAI and Gemini API process inputs in real-time but do not retain data or use it for training. Dify implements strong security protocols and adheres to data protection regulations. 

To further protect privacy, we recommend that files uploaded to the system do not contain personally identifiable client information. Before submitting data, review it for sensitive details (e.g., health records) and remove identifiable elements such as names, email addresses, and record numbers. If collecting sensitive data is necessary, conduct a privacy assessment first to ensure responsible handling. 

Disclaimers 

These tools are powered by generative AI, specifically Large Language Models (LLMs). The world of LLMs is changing rapidly. When you use these tools, you need to know that they are unpredictable and often inaccurate.  

Every time you use an LLM to ask a complex open-ended question you will get a different answer. As noted by Lethain (2023): 

"Because you cannot rely on LLMs to provide correct responses, and you cannot generate a confidence score for any given response, you have to either accept potential inaccuracies or keep a Human-in-the-Loop (HITL) to validate the response." 

For our evaluation tools, we have taken several steps to improve reliability and reduce the risk of incorrect or unhelpful results: 

  • We developed a structured evaluation framework that ensures a consistent approach.

  • Each step of the process is divided into smaller tasks with specific prompts to reduce errors. 

  • We tested initial prompts using a less capable LLM to refine instructions before deploying a more advanced model for use with better accuracy. 

  • Automated review and correction mechanisms are built into the tool. 

  • We ask users to copy and paste the results into a document for review, acting as a 'human in the loop'.   

Most of the evaluation effort should be in communicating with participants, diverse interest groups and decision-makers, exploring emerging findings with them and coming to shared conclusions about how to improve services. That process in itself will reduce the risk of inaccuracies.  

The accuracy of the app will be the result of a collaboration between you and the LLMs. Users are responsible for the final result and how they use it.   

LLMs are an incredibly useful resource. It's like working with a weird intern who is astoundingly knowledgeable in some areas and completely without judgement in other areas - and you don't know which until you ask them to do something - and then they improve by the next month anyway. They need close supervision but can help you do a lot. Use with care but also (we suggest) use with an appreciation of how helpful they can be.