Skip to content

Conversation

@Jiaxuan-Sun
Copy link

No description provided.

info,
kl,
)
exp.action_entropy = output.action_entropy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to include action_entropy in the ExperienceVL and Experience definitions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we should enable it in the dataclass definition and set the default value to None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the below creation can also include action_entropy, we don't need extra assignment code

@puyuan1996 puyuan1996 added the enhancement New feature or request label Jan 4, 2026
parser.add_argument("--use_cpg_loss", action="store_true", default=False, help="whether to use the clipped policy gradient loss from CPGD")

# High-entropy token filtering (from "Beyond the 80/20 Rule" paper)
parser.add_argument("--high_entropy_token_ratio", type=float, default=0.0, help="Ratio of high-entropy tokens to use for gradient updates (0.0 means use all tokens, 0.2 means use top 20% highest entropy tokens). Common value when enabled: 0.2. Based on 'Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning' (https://arxiv.org/abs/2506.01939)")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented a configuration option that allows high-entropy tokens within the stored trajectory to be saved with a distinct special token/marker.

@@ -0,0 +1,532 @@
<!DOCTYPE html>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should move this visualization to examples/entropy_viz directory.

quick_view.sh Outdated
@@ -0,0 +1,13 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use English statements and comments

# Create entropy_mask if high_entropy_token_ratio > 0 and action_entropy is available
entropy_mask = None
if hasattr(experience, 'action_entropy') and experience.action_entropy is not None:
if hasattr(self.actor, 'high_entropy_token_ratio') and self.actor.high_entropy_token_ratio > 0.0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use self.high_entropy_token_ratio

entropy_mask = None
if hasattr(experience, 'action_entropy') and experience.action_entropy is not None:
if hasattr(self.actor, 'high_entropy_token_ratio') and self.actor.high_entropy_token_ratio > 0.0:
from lightrft.models.utils import create_high_entropy_mask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move import to the top side

info,
kl,
)
exp.action_entropy = output.action_entropy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we should enable it in the dataclass definition and set the default value to None

info,
kl,
)
exp.action_entropy = output.action_entropy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the below creation can also include action_entropy, we don't need extra assignment code

if return_output:
# Include action_entropy in output if computed
if action_entropy is not None:
output_dict = dict(output)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to transform output into a dict, what is the original type of output

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.model is an AutoModelForCausalLM, and its forward() method returns a subclass of ModelOutput (such as CausalLMOutputWithPast).

Why is it necessary to convert it to a dictionary?
• ModelOutput is a fixed dataclass:

• It supports dictionary-style access: output["logits"] is valid.

• However, it does not support directly adding new keys: output["action_entropy"] = value will fail (because the fields are fixed).

• Therefore, converting it to a regular dictionary is required to add action_entropy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, add some comments here to explain the data type

self.use_dapo = use_dapo
self.use_cpg_loss = use_cpg_loss
self.high_entropy_token_ratio = high_entropy_token_ratio
self.entropy_mask = entropy_mask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need entropy_mask in init function? Do we have the default settings for mask?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants