feature(sunjx): add high entropy token selection #6

Jiaxuan-Sun · 2025-12-25T16:27:59Z

No description provided.

lightrft/models/actor_language.py

lightrft/models/loss.py

puyuan1996 · 2026-01-04T09:12:39Z

lightrft/trainer/fast_exp_maker.py

                info,
                kl,
            )
+            exp.action_entropy = output.action_entropy


Is it possible to include action_entropy in the ExperienceVL and Experience definitions?

+1, we should enable it in the dataclass definition and set the default value to None

and the below creation can also include action_entropy, we don't need extra assignment code

puyuan1996 · 2026-01-04T09:17:39Z

examples/gsm8k_geo3k/train_colocate.py

    parser.add_argument("--use_cpg_loss", action="store_true", default=False, help="whether to use the clipped policy gradient loss from CPGD")
+
+    # High-entropy token filtering (from "Beyond the 80/20 Rule" paper)
+    parser.add_argument("--high_entropy_token_ratio", type=float, default=0.0, help="Ratio of high-entropy tokens to use for gradient updates (0.0 means use all tokens, 0.2 means use top 20% highest entropy tokens). Common value when enabled: 0.2. Based on 'Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning' (https://arxiv.org/abs/2506.01939)")


Implemented a configuration option that allows high-entropy tokens within the stored trajectory to be saved with a distinct special token/marker.

…iew in PolicyLoss

…ries.

PaParaZz1 · 2026-01-08T03:13:32Z

examples/entropy_viz/render_trajectories.html

@@ -0,0 +1,532 @@
+<!DOCTYPE html>


we should move this visualization to examples/entropy_viz directory.

PaParaZz1 · 2026-01-08T03:13:47Z

quick_view.sh

@@ -0,0 +1,13 @@
+#!/bin/bash


use English statements and comments

PaParaZz1 · 2026-01-08T03:14:54Z

lightrft/trainer/spmd_ppo_trainer.py

+                # Create entropy_mask if high_entropy_token_ratio > 0 and action_entropy is available
+                entropy_mask = None
+                if hasattr(experience, 'action_entropy') and experience.action_entropy is not None:
+                    if hasattr(self.actor, 'high_entropy_token_ratio') and self.actor.high_entropy_token_ratio > 0.0:


why not use self.high_entropy_token_ratio

PaParaZz1 · 2026-01-08T03:15:13Z

lightrft/trainer/spmd_ppo_trainer.py

+                entropy_mask = None
+                if hasattr(experience, 'action_entropy') and experience.action_entropy is not None:
+                    if hasattr(self.actor, 'high_entropy_token_ratio') and self.actor.high_entropy_token_ratio > 0.0:
+                        from lightrft.models.utils import create_high_entropy_mask


move import to the top side

PaParaZz1 · 2026-01-08T03:17:07Z

lightrft/trainer/fast_exp_maker.py

                info,
                kl,
            )
+            exp.action_entropy = output.action_entropy


+1, we should enable it in the dataclass definition and set the default value to None

PaParaZz1 · 2026-01-08T03:18:08Z

lightrft/trainer/fast_exp_maker.py

                info,
                kl,
            )
+            exp.action_entropy = output.action_entropy


and the below creation can also include action_entropy, we don't need extra assignment code

PaParaZz1 · 2026-01-08T03:22:55Z

lightrft/models/actor_language.py

+        if return_output:
+            # Include action_entropy in output if computed
+            if action_entropy is not None:
+                output_dict = dict(output)


why do we need to transform output into a dict, what is the original type of output

self.model is an AutoModelForCausalLM, and its forward() method returns a subclass of ModelOutput (such as CausalLMOutputWithPast).

Why is it necessary to convert it to a dictionary?
• ModelOutput is a fixed dataclass:

• It supports dictionary-style access: output["logits"] is valid.

• However, it does not support directly adding new keys: output["action_entropy"] = value will fail (because the fields are fixed).

• Therefore, converting it to a regular dictionary is required to add action_entropy.

OK, add some comments here to explain the data type

PaParaZz1 · 2026-01-08T03:27:12Z

lightrft/models/loss.py

        self.use_dapo = use_dapo
        self.use_cpg_loss = use_cpg_loss
+        self.high_entropy_token_ratio = high_entropy_token_ratio
+        self.entropy_mask = entropy_mask


why do we need entropy_mask in init function? Do we have the default settings for mask?

lightrft/models/loss.py

…mask merging logic in PolicyLoss

Jiaxuan-Sun added 2 commits December 26, 2025 00:26

add high entropy token selection

13eb03d

Restore default GPU settings

6516236

puyuan1996 requested changes Jan 4, 2026

View reviewed changes

puyuan1996 added the enhancement New feature or request label Jan 4, 2026

puyuan1996 reviewed Jan 4, 2026

View reviewed changes

puyuan1996 mentioned this pull request Jan 5, 2026

Roadmap for LightRFT v0.1.1 #19

Open

Jiaxuan-Sun added 4 commits January 5, 2026 16:45

feature(sunjx): optimize the overview of loss.py and extend the overv…

ba93a42

…iew in PolicyLoss

feature(sunjx): add definition of action_entropy in Experience(VL)

ec0fae3

feature(sunjx): mark high entropy token

28aef99

feature(sunjx): Optimized the display method of high-entropy trajecto…

555b3fc

…ries.

PaParaZz1 requested changes Jan 8, 2026

View reviewed changes

Jiaxuan-Sun added 4 commits January 8, 2026 14:30

feature(sunjx): Remove redundant entropy_mask parameter and simplify …

81fadfb

…mask merging logic in PolicyLoss

Merge branch 'main' into feature/high-entropy-token-selection

622eeaf

Merge branch 'main' into feature/high-entropy-token-selection

22aade5

Merge action_entropy processing into unified keys loop

28e3389

feature(sunjx): add high entropy token selection #6

Are you sure you want to change the base?

feature(sunjx): add high entropy token selection #6

Uh oh!

Conversation

Jiaxuan-Sun commented Dec 25, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants