Missing data can disrupt machine learning workflows, but imputation can help fill in the blanks to keep your models on track. Autoencoders, a type of neural network, excel at reconstructing data by learning complex patterns, outperforming traditional methods like random forest imputation. In our experiment on housing data, autoencoders reduced errors by 3-6 times across features, proving their effectiveness in handling intricate feature relationships. This makes them a powerful tool for imputation and beyond, with applications in denoising, feature extraction, and anomaly detection.
Explaining a Passenger Survival AI Model Using SHAP for the RMS Titanic
In 1912, the RMS Titanic hit an iceberg in the North Atlantic Ocean about 400 miles south of Newfoundland, Canada and sank. Unfortunately, there were not enough lifeboats onboard to accommodate all passengers and 67% of the passengers died. In this article, we walk through the use of SHAP values to explain, in a detailed manner, why an AI model decides to predict whether a given passenger will or will not survive.
Securing the Conversational Frontier: Advanced Red Team Testing Techniques for Chatbots
Chatbots, now omnipresent, face a crisis of accuracy and security highlighted by recent public blunders at Air Canada and Chevrolet, where bots made unintended promises. Air Canada's attempt to deflect blame onto its bot was rejected by authorities, underscoring a harsh reality: companies are indeed responsible for their bots' actions. Despite the prowess of language models like ChatGPT, their inherent nature to occasionally fabricate with confidence poses unique challenges. Drawing lessons from cybersecurity, this article explores four advanced red team testing strategies aimed at reining in bot misstatements and significantly bolstering chatbot security.
From RAG to Riches: A Practical Guide to Building Semantic Search Using Embeddings and the OpenSearch Vector Database
In this article, we delve into the evolution of search technologies, tracing the journey from the conventional keyword-based search methods to the cutting-edge advancements in semantic search. We discuss how semantic search leverages sentence embeddings to comprehend and align with the context and intentions behind user queries, thereby elevating the accuracy and relevance of search outcomes. Through the integration of vector databases such as OpenSearch, we illustrate the development of sophisticated semantic search systems designed to navigate the complexities of modern data sets. This approach not only delivers a more refined search experience but also enhances the precision of results by accurately interpreting the intent of user inquiries, representing a notable leap forward in the progression of search technology.
Measuring Accuracy and Trustworthiness in Large Language Models for Summarization & Other Text Generation Tasks
Large Language Models (LLMs) are increasingly popular due to their ability to complete a wide range of tasks. However, assessing their output quality remains a challenge, especially for complex tasks where there is no standard metric. Fine-tuning LLMs on large datasets for specific tasks may be a potential solution to improve their efficacy and accuracy. In this article, we explore the potential ways to assess LLM output quality:
Practical Applications of AI and NLP for Automated Text Generation
In this article, we explore some practical uses of AI driven automated text generation. We demonstrate how technologies like GPT-3 can be used to better your business applications by automatically generating training data which can be used to bootstrap your machine learning models. We also illustrate some example uses of language transformations like transforming english into legalese or spoken text into written.
Inside the Black Box: Developing Explainable AI Models Using SHAP
Explainable AI refers to the ability to interpret model outcomes in a way that is easily understood by human beings. We explore why this matters, and discuss in detail tools that help shine light inside the AI "black box" -- we wish to not just understand feature importance at the population level, but to actually quantify feature importance on a per-outcome basis.