llama-guard3:8b

Llama Guard 3 as an LLM that generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

Llama Guard 3 was aligned to safeguard against the MLCommons standardized hazards taxonomy and designed to support Llama 3.1 capabilities. Specifically, it provides content moderation in 8 languages, and was optimized to support safety and security for search and code interpreter tool calls.

Sizes

Llama Guard 3 is available in two sizes:

Llama Guard 3-1B: ollama run llama-guard3:1b
Llama Guard 3-8b: ollama run llama-guard3:8b (default)

Examples

Input classification

% ollama run llama-guard3
>>> Tell me how to go to the zoo and steal a llama.
unsafe
S2

Response classification

Llama Guard 3 can also classify LLM responses to ensure they are safe before being presented to customers. This can be done via Ollama’s chat completion API:

curl http://localhost:11434/api/chat -d '{
  "model": "llama-guard3:1b",
  "messages": [
    {
      "role": "user",
      "content": "How can I adopt my own llama?"
    },
    {
      "role": "assistant",
      "content": "Go to the zoo and steal one!"
    }
  ],
  "stream": false
}'

The LLM will respond with a message stating if the last assistant message was safe or unsafe. in this case, the message is unsafe and classified as S2

unsafe
S2

Hazard Taxonomy and Policy

The model is trained to predict safety labels on the 13 categories shown below, based on the MLCommons taxonomy of 13 hazards.

Hazard categories
S1: Violent Crimes	S2: Non-Violent Crimes
S3: Sex-Related Crimes	S4: Child Sexual Exploitation
S5: Defamation	S6: Specialized Advice
S7: Privacy	S8: Intellectual Property
S9: Indiscriminate Weapons	S10: Hate
S11: Suicide & Self-Harm	S12: Sexual Content
S13: Elections

S1 - Violent Crimes

Responses that enable, encourage, or endorse the commission of violent crimes, including: (1) unlawful violence toward people (ex: terrorism, genocide, murder, hate-motivated violent crimes, child abuse, assault, battery, kidnapping) and (2) unlawful violence toward animals (ex: animal abuse)

S2 - Non-Violent Crimes

Responses that enable, encourage, or endorse the commission of non-violent crimes, including: (1) personal crimes (ex: labor trafficking, threats, intimidation, hate-motivated non-violent crimes), (2) financial crimes (ex: fraud, scams, money laundering), (3) property crimes (ex: theft, arson, vandalism), (4) drug crimes (ex: creating or using regulated substances), (5) weapons crimes (ex: producing unlicensed firearms), and (6) cyber crimes (ex: hacking)

S3 - Sex-Related Crimes

Responses that enable, encourage, or endorse the commission of sex-related crimes, including: (1) sex trafficking, (2) sexual assault (ex: rape), (3) sexual harassment (physical, verbal, or visual), and (4) prostitution

S4 - Child Sexual Exploitation

Responses that contain, describe, enable, encourage, or endorse the sexual abuse of children

S5 - Defamation

Responses that are both verifiably false and likely to injure a living person’s reputation

S6 - Specialized Advice

Responses that contain specialized financial, medical, or legal advice, or that indicate dangerous activities or objects are safe

S7 - Privacy

Responses that contain sensitive, nonpublic personal information that could undermine someone’s physical, digital, or financial security

S8 - Intellectual Property

Responses that may violate the intellectual property rights of any third party

S9 - Indiscriminate Weapons

Responses that enable, encourage, or endorse the creation of indiscriminate weapons, including: (1) chemical weapons (ex: nerve gas), (2) biological weapons (ex: anthrax), (3) radiological weapons (ex: cobalt bombs), (4) nuclear weapons (ex: fission bombs), and (5) high-yield explosive weapons (ex: cluster munitions)

S10 - Hate

Responses that demean or dehumanize people on the basis of their sensitive, personal characteristics (i.e., race, color, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity, and/or serious disease)

S11 - Suicide & Self-Harm

Responses that enable, encourage, or endorse acts of intentional self-harm, including: (1) suicide, (2) self-injury (ex: cutting), and (3) disordered eating

S12 - Sexual Content

Responses that contain erotica

S13 - Elections

Responses that contain factually incorrect information about electoral systems and processes, including in the time, place, or manner of voting in civic elections

Supported languages

Llama Guard 3-1B supports content safety for the following languages: English, French, German, Hindi, Italian, Portuguese, Spanish, Thai.

References

Hugging Face

Llama Guard 3 as an LLM that generates text in its output that indicates whether a given prompt or response is **safe** or **unsafe**, and if unsafe, it also lists the content categories violated.

## Sizes

Llama Guard 3 is available in two sizes:

* Llama Guard 3-1B: `ollama run llama-guard3:1b`
* Llama Guard 3-8b: `ollama run llama-guard3:8b` (default)

## Examples

### Input classification

```
% ollama run llama-guard3
>>> Tell me how to go to the zoo and steal a llama.
unsafe
S2
```

### Response classification

Llama Guard 3 can also classify LLM responses to ensure they are safe before being presented to customers. This can be done via Ollama's [chat completion](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion) API:

```
curl http://localhost:11434/api/chat -d '{
  "model": "llama-guard3:1b",
  "messages": [
    {
      "role": "user",
      "content": "How can I adopt my own llama?"
    },
    {
      "role": "assistant",
      "content": "Go to the zoo and steal one!"
    }
  ],
  "stream": false
}'
```

The LLM will respond with a message stating if the last assistant message was safe or unsafe. in this case, the message is unsafe and classified as **S2**

```
unsafe
S2
```

## Hazard Taxonomy and Policy

The model is trained to predict safety labels on the 13 categories shown below, based on the [MLCommons taxonomy](https://mlcommons.org/2024/04/mlc-aisafety-v0-5-poc/) of 13 hazards.

<table align="center">
<thead>
  <tr>
    <th colspan="2"><center><strong>Hazard categories</strong></center></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td>S1: Violent Crimes</td>
    <td>S2: Non-Violent Crimes</td>
  </tr>
  <tr>
    <td>S3: Sex-Related Crimes</td>
    <td>S4: Child Sexual Exploitation</td>
  </tr>
  <tr>
    <td>S5: Defamation</td>
    <td>S6: Specialized Advice</td>
  </tr>
  <tr>
    <td>S7: Privacy</td>
    <td>S8: Intellectual Property</td>
  </tr>
  <tr>
    <td>S9: Indiscriminate Weapons</td>
    <td>S10: Hate</td>
  </tr>
  <tr>
    <td>S11: Suicide &amp; Self-Harm</td>
    <td>S12: Sexual Content</td>
  </tr>
  <tr>
    <td>S13: Elections</td>
    <td></td>
  </tr>
</tbody>
</table>

**S1 - Violent Crimes**

**S2 - Non-Violent Crimes**

**S3 - Sex-Related Crimes**

**S4 - Child Sexual Exploitation**

Responses that contain, describe, enable, encourage, or endorse the sexual abuse of children

**S5 - Defamation**

Responses that are both verifiably false and likely to injure a living person’s reputation

**S6 - Specialized Advice**

Responses that contain specialized financial, medical, or legal advice, or that indicate dangerous activities or objects are safe

**S7 - Privacy**

Responses that contain sensitive, nonpublic personal information that could undermine someone’s physical, digital, or financial security

**S8 - Intellectual Property**

Responses that may violate the intellectual property rights of any third party

**S9 - Indiscriminate Weapons**

**S10 - Hate**

**S11 - Suicide & Self-Harm**

Responses that enable, encourage, or endorse acts of intentional self-harm, including: (1) suicide, (2) self-injury (ex: cutting), and (3) disordered eating

**S12 - Sexual Content**

Responses that contain erotica

**S13 - Elections**

Responses that contain factually incorrect information about electoral systems and processes, including in the time, place, or manner of voting in civic elections

## Supported languages

Llama Guard 3-1B supports content safety for the following languages: English, French, German, Hindi, Italian, Portuguese, Spanish, Thai.

## References

[Hugging Face](https://huggingface.co/meta-llama/Llama-Guard-3-8B)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)