AI-based coding assistants are quickly becoming a must-have for developers, and among the most popular is AWS CodeWhisperer.
It works with integrated development environments (IDEs) and has multi-language support, advanced security, and customization features—plus additional benefits if you're within the AWS ecosystem.
But when you implement AI tools like these, there are real risks you need to be aware of and deal with. From issues with intellectual property rights and data privacy to suggestions containing insecure or outdated implementations, things can go really bad, fast. Besides general frustration and extra work, you may be faced with legal problems, and livelihoods could be at stake if your business is at risk.
On the contrary, CodeWhisperer can be a lifesaver. In this blog post, I will walk you through—at a basic level—the risks and how to tackle them effectively.
How CodeWhisperer works its magic
As you type, CodeWhisperer gives you real-time coding suggestions—from single-line completions to full code blocks. But how does it do that?
To do so accurately:
- CodeWhisperer collects relevant context from your integrated development environment. This includes source code snippets and filenames.
- It sends the information to the AWS cloud, where AWS uses large language models (LLMs) to process the request and create suggestions for you.
You can get more relevant code suggestions from the LLM if you use CodeWhisperer’s customization feature. Then, you make the LLM aware of additional code outside of its original training data. For example, internal libraries and APIs, best practices, and architectural patterns.
Because CodeWhisperer operates on sensitive data, it’s important to protect that information when it is in transit and at rest.
CodeWhisperer security and how to get it right
CodeWhisperer falls under the AWS shared responsibility model. Under the shared responsibility model, AWS is responsible for securing the infrastructure of its platform. You are responsible for using and configuring the provided platform securely based on your needs.
With that in mind, what can you, as a user of CodeWhisperer, do to harden the default configuration? Here are the main ways to provide a secure configuration for your organization:
- Enable multi-factor authentication (MFA) for all CodeWhisperer users. Then, for example, compromised accounts can’t leak the contents of customizations.
- Use a modern version of TLS when interacting directly with AWS services, for example, when uploading files for additional customizations into S3.
- Enable API and user activity logging with AWS CloudTrail so you’ll have better visibility into how the tools are used.
- For additional security for the customization files stored at rest, enable self-managed encryption keys on the S3 buckets with the customization files. If you haven’t configured self-managed keys, AWS will ensure encryption at rest with a key managed by them.
- Use Amazon Macie to secure S3 buckets with customization files. This includes configuring the proper access control to those S3 buckets and scanning for sensitive content like personally identifiable information (PII).
CodeWhisperer has a built-in tool for security scanning. It can scan across various programming languages, including Java, Python, JavaScript, and others. Under the hood, CodeWhisperer’s code scanning uses the power of AWS CodeGuru Detector Library.
General security best practices for adopting CodeWhisperer
Make sure your developers know the limitations of AI
When working with AI tools, always remember that you are responsible for the outcomes. The LLM is merely a tool to aid developers in their work; it is not an autonomous robot replacing them completely.
Suggestions coming from any AI tool, including CodeWhisperer, can look like good contributions at first glance. But they may be far from meeting good code quality standards.
- There could be subtle bugs or security vulnerabilities.
- The tool may implement something different than you (the developer) intended.
- The tool may fail to incorporate design requirements (because they may not have been available to the AI).
Therefore, it is absolutely crucial that you train every developer on the weaknesses of AI. This includes biases in the training data that lead to skewed responses, “hallucinations” where the AI confidently produces suggestions that are not functional, and possible intellectual property (IP) conflicts when you use the AI output directly.
Know where the code suggestions come from
If you have a Professional Tier license in CodeWhisperer, you are guaranteed that CodeWhisperer will not use the data to train the LLM model. But the code that CodeWhisperer returns might still have similarities to public code from its training data.
To safely deal with these kinds of suggestions, there is a feature called “public code filter and reference tracking.” If your code suggestions match public code, you will be notified where the matching code originates from. When you know the source, you can quickly decide whether the license of the matched code is compatible with the code base you are working on.
To avoid CodeWhisperer suggesting snippets matching public code altogether, simply disable the Reference Tracking feature on the organization level.
Security-scan across languages
One of CodeWhisperer’s great built-in security mechanisms, which I hinted at earlier, is its tool for running security scans. To use it optimally, set up code scanning to run at regular intervals.
For specific languages—Java, JavaScript, and Python as of writing—CodeWhisperer can suggest fixes for the vulnerabilities it finds. But be aware that all code written in programming languages not supported by CodeGuru will be silently ignored, so you will not receive notifications of vulnerabilities in those parts of the code base, and you will not be warned that it cannot be processed.
Don’t make coding harder than it has to be
Coding assistants are becoming indispensable for developers, and AWS CodeWhisperer is a favorite among many of them. However, LLMs have limitations and risks; sensitive data, like source code and potential secrets, are directly sent to the cloud. You need to know how to integrate suggestions from the LLM safely and in a compliant way to the production source code.
Adopting CodeWhisperer can be done without much friction through a combination of hardening the configuration options on the AWS organization administration portal and with proper user onboarding to educate developers on how to consume LLM-based solutions securely and effectively.
Life and coding can be easier. Don’t make it harder than it has to be.
Published: Apr 29, 2024