Skip to content

Conversation

@VitalC-3026
Copy link
Contributor

Hi @lewtun @stevhliu,
I'd like to start a Chinese translation effort to bring this incredible chapter to more Chinese-speaking programmers and RL learners. This PR contains the first draft of Chapter 12, Section 1.
I'm submitting this early for a couple of reasons:

  • Coordination – To check if anyone else is already working on translations, so we can avoid duplicate efforts.
  • Build verification – I've been running into issues with the doc-builder—not just for my translated pages, but also for the original English pages. I'd appreciate any guidance on whether this is a known issue or something on my end.

Looking forward to your feedback!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your interest in translating the course!

I think it may be better to not have a separate img folder in chapters/zh-CN, and instead, you can link to the images in chapters/en to avoid making the repo too heavy.

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, just one missing header and then we can merge!


对于大语言模型,奖励的设计旨在反映模型在特定任务上的表现——比如回答是否有帮助(helpful)、是否真实(truthful),以及是否无害(harmless)。

这就是智能体选择行动时所依据的策略(Policy)。就像小狗需要理解当你喊“坐下”时,它应该怎么做一样。在强化学习中,策略正是我们真正试图学习和改进的核心。它可以是一系列规则,或者一个函数,用于指导智能体在不同的情境下应该采取什么行动。起初,策略可能是随机的,但随着智能体不断学习,策略会变得更擅长选择能带来更高奖励的行动。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're missing the ### Policy header somewhere here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the checks! Updated.

@stevhliu stevhliu merged commit 1f2476b into huggingface:main Dec 8, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants