A vital a part of transport software program securely is pink teaming. It broadly refers back to the apply of emulating real-world adversaries and their instruments, techniques, and procedures to determine dangers, uncover blind spots, validate assumptions, and enhance the general safety posture of techniques. Microsoft has a wealthy historical past of pink teaming rising know-how with a objective of proactively figuring out failures within the know-how. As AI techniques turned extra prevalent, in 2018, Microsoft established the AI Purple Group: a gaggle of interdisciplinary specialists devoted to considering like attackers and probing AI techniques for failures.
We’re sharing finest practices from our workforce so others can profit from Microsoft’s learnings. These finest practices may help safety groups proactively hunt for failures in AI techniques, outline a defense-in-depth strategy, and create a plan to evolve and develop your safety posture as generative AI techniques evolve.
The apply of AI pink teaming has advanced to tackle a extra expanded that means: it not solely covers probing for safety vulnerabilities, but in addition contains probing for different system failures, such because the era of doubtless dangerous content material. AI techniques include new dangers, and pink teaming is core to understanding these novel dangers, akin to immediate injection and producing ungrounded content material. AI pink teaming is not only a pleasant to have at Microsoft; it’s a cornerstone to accountable AI by design: as Microsoft President and Vice Chair, Brad Smith, introduced, Microsoft not too long ago dedicated that every one high-risk AI techniques will undergo unbiased pink teaming earlier than deployment.
The objective of this weblog is to contextualize for safety professionals how AI pink teaming intersects with conventional pink teaming, and the place it differs. This, we hope, will empower extra organizations to pink workforce their very own AI techniques in addition to present insights into leveraging their current conventional pink groups and AI groups higher.
Purple teaming helps make AI implementation safer
During the last a number of years, Microsoft’s AI Purple Group has constantly created and shared content material to empower safety professionals to suppose comprehensively and proactively about easy methods to implement AI securely. In October 2020, Microsoft collaborated with MITRE in addition to trade and tutorial companions to develop and launch the Adversarial Machine Studying Risk Matrix, a framework for empowering safety analysts to detect, reply, and remediate threats. Additionally in 2020, we created and open sourced Microsoft Counterfit, an automation device for safety testing AI techniques to assist the entire trade enhance the safety of AI options. Following that, we launched the AI safety threat evaluation framework in 2021 to assist organizations mature their safety practices across the safety of AI techniques, along with updating Counterfit. Earlier this 12 months, we introduced extra collaborations with key companions to assist organizations perceive the dangers related to AI techniques in order that organizations can use them safely, together with the mixing of Counterfit into MITRE tooling, and collaborations with Hugging Face on an AI-specific safety scanner that’s out there on GitHub.

Safety-related AI pink teaming is a component of a bigger accountable AI (RAI) pink teaming effort that focuses on Microsoft’s AI rules of equity, reliability and security, privateness and safety, inclusiveness, transparency, and accountability. The collective work has had a direct impression on the best way we ship AI merchandise to our prospects. For example, earlier than the brand new Bing chat expertise was launched, a workforce of dozens of safety and accountable AI specialists throughout the corporate spent a whole lot of hours probing for novel safety and accountable AI dangers. This was in addition to the common, intensive software program safety practices adopted by the workforce, in addition to pink teaming the bottom GPT-4 mannequin by RAI specialists upfront of creating Bing Chat. Our pink teaming findings knowledgeable the systematic measurement of those dangers and constructed scoped mitigations earlier than the product shipped.
Steering and assets for pink teaming
AI pink teaming usually takes place at two ranges: on the base mannequin stage (e.g., GPT-4) or on the software stage (e.g., Safety Copilot, which makes use of GPT-4 within the again finish). Each ranges carry their very own benefits: as an example, pink teaming the mannequin helps to determine early within the course of how fashions might be misused, to scope capabilities of the mannequin, and to know the mannequin’s limitations. These insights might be fed into the mannequin improvement course of to enhance future mannequin variations but in addition get a jump-start on which purposes it’s most suited to. Utility-level AI pink teaming takes a system view, of which the bottom mannequin is one half. For example, when AI pink teaming Bing Chat, your entire search expertise powered by GPT-4 was in scope and was probed for failures. This helps to determine failures past simply the model-level security mechanisms, by together with the general software particular security triggers.

Collectively, probing for each safety and accountable AI dangers supplies a single snapshot of how threats and even benign utilization of the system can compromise the integrity, confidentiality, availability, and accountability of AI techniques. This mixed view of safety and accountable AI supplies worthwhile insights not simply in proactively figuring out points, but in addition to know their prevalence within the system by means of measurement and inform methods for mitigation. Under are key learnings which have helped form Microsoft’s AI Purple Group program.
- AI pink teaming is extra expansive. AI pink teaming is now an umbrella time period for probing each safety and RAI outcomes. AI pink teaming intersects with conventional pink teaming objectives in that the safety element focuses on mannequin as a vector. So, among the objectives could embrace, as an example, to steal the underlying mannequin. However AI techniques additionally inherit new safety vulnerabilities, akin to immediate injection and poisoning, which want particular consideration. Along with the safety objectives, AI pink teaming additionally contains probing for outcomes akin to equity points (e.g., stereotyping) and dangerous content material (e.g., glorification of violence). AI pink teaming helps determine these points early so we will prioritize our protection investments appropriately.
- AI pink teaming focuses on failures from each malicious and benign personas. Take the case of pink teaming new Bing. Within the new Bing, AI pink teaming not solely targeted on how a malicious adversary can subvert the AI system by way of security-focused strategies and exploits, but in addition on how the system can generate problematic and dangerous content material when common customers work together with the system. So, in contrast to conventional safety pink teaming, which largely focuses on solely malicious adversaries, AI pink teaming considers broader set of personas and failures.
- AI techniques are continually evolving. AI purposes routinely change. For example, within the case of a giant language mannequin software, builders could change the metaprompt (underlying directions to the ML mannequin) primarily based on suggestions. Whereas conventional software program techniques additionally change, in our expertise, AI techniques change at a quicker charge. Thus, you will need to pursue a number of rounds of pink teaming of AI techniques and to ascertain systematic, automated measurement and monitor techniques over time.
- Purple teaming generative AI techniques requires a number of makes an attempt. In a conventional pink teaming engagement, utilizing a device or method at two completely different time factors on the identical enter, would all the time produce the identical output. In different phrases, usually, conventional pink teaming is deterministic. Generative AI techniques, then again, are probabilistic. Because of this operating the identical enter twice could present completely different outputs. That is by design as a result of the probabilistic nature of generative AI permits for a wider vary in inventive output. This additionally makes it tough to pink teaming since a immediate could not result in failure within the first try, however achieve success (in surfacing safety threats or RAI harms) within the succeeding try. A method we have now accounted for that is, as Brad Smith talked about in his weblog, to pursue a number of rounds of pink teaming in the identical operation. Microsoft has additionally invested in automation that helps to scale our operations and a systemic measurement technique that quantifies the extent of the danger.
- Mitigating AI failures requires protection in depth. Identical to in conventional safety the place an issue like phishing requires quite a lot of technical mitigations akin to hardening the host to neatly figuring out malicious URIs, fixing failures discovered by way of AI pink teaming requires a defense-in-depth strategy, too. This includes using classifiers to flag doubtlessly dangerous content material to utilizing metaprompt to information habits to limiting conversational drift in conversational eventualities.
Constructing know-how responsibly and securely is in Microsoft’s DNA. Final 12 months, Microsoft celebrated the 20-year anniversary of the Reliable Computing memo that requested Microsoft to ship merchandise “as out there, dependable and safe as commonplace providers akin to electrical energy, water providers, and telephony.” AI is shaping as much as be essentially the most transformational know-how of the twenty first century. And like all new know-how, AI is topic to novel threats. Incomes buyer belief by safeguarding our merchandise stays a guideline as we enter this new period – and the AI Purple Group is entrance and heart of this effort. We hope this weblog put up conjures up others to responsibly and safely combine AI by way of pink teaming.
Assets
AI pink teaming is a part of the broader Microsoft technique to ship AI techniques securely and responsibly. Listed below are another assets to supply insights into this course of:
- For purchasers who’re constructing purposes utilizing Azure OpenAI fashions, we launched a information to assist them assemble an AI pink workforce, outline scope and objectives, and execute on the deliverables.
- For safety incident responders, we launched a bug bar to systematically triage assaults on ML techniques.
- For ML engineers, we launched a guidelines to finish AI threat evaluation.
- For builders, we launched risk modeling steerage particularly for ML techniques.
- For anybody interested by studying extra about accountable AI, we’ve launched a model of our Accountable AI Commonplace and Affect Evaluation, amongst different assets.
- For engineers and policymakers, Microsoft, in collaboration with Berkman Klein Heart at Harvard College, launched a taxonomy documenting varied machine studying failure modes.
- For the broader safety group, Microsoft hosted the annual Machine Studying Evasion Competitors.
- For Azure Machine Studying prospects, we offered steerage on enterprise safety and governance.
Contributions from Steph Ballard, Forough Poursabzi, Amanda Minnich, Gary Lopez Munoz, and Chang Kawaguchi.