Skip to content

Conversation

@Postur
Copy link

@Postur Postur commented Jan 21, 2026

issue:
if an attribute value contains escaped chars, it is not unescaped when using the quick_xml deserializer.

reproduce:

  1. add > to the end of "foo"
image
  1. run read_write example and examine output.xml
image

note: the & got serialized to & correctly.

what we expected:
image

fix:
when deserializing the String type with String::deserialize_bytes() from the pub fn read_attrib<T> function
we want to receive it unescaped back.

file: xsd-parser-types/src/quick_xml/deserialize.rs

    fn deserialize_bytes(helper: &mut DeserializeHelper, bytes: &[u8]) -> Result<Self, Error> {
        let s = from_utf8(bytes).map_err(Error::from)?;

        Self::deserialize_str(helper, s)
        Self::deserialize_str(helper, &unescape(s)?)
                                       ~~~~~~~~~~
    }

Copy link
Owner

@Bergmann89 Bergmann89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Postur, Thanks for the fix. May you also add a simple unit test for this instead of using the example?

@Postur
Copy link
Author

Postur commented Jan 22, 2026

sorry I missed the tests, this fix requires more than I first thought.

read_attrib<T>() uses T::deserialize_bytes() and if the attribute was defined as an "xs:string" type in the xsd, then T = String

String::deserialize_bytes() however is also used by other non unescapable string types like cdata.

If we unescape in the read_attrib() function we are introducing a lot more errors into the function.

        let value = unescape(from_utf8(value)?)?;

        let value = T::deserialize_bytes(self, value.as_bytes())?;

but converting, unecsaping, then converting again feels dirty.

@Bergmann89
Copy link
Owner

You can use T::deserialize_str instead of T::deserialize_bytes. T::deserialize_str has a default implementation that simply forwards to T::deserialize_bytes. As you already mentioned this feels some kind of dirty, but string capable types can provide a custom implementation for T::deserialize_str to not have to do the UTF-8 conversion twice.

Had a short test on my side, and this works as intended:

    /// Helper function to convert and store an attribute from the XML event.
    ///
    /// Since attributes needs to be escaped this will convert the passed
    /// `value` bytes to a UTF-8 string, unescape it and then deserialize
    /// it using the [`DeserializeBytes::deserialize_str`] method.
    ///
    /// # Errors
    ///
    /// Returns an [`struct@Error`] with [`ErrorKind::DuplicateAttribute`] if `store`
    /// already contained a value.
    pub fn read_attrib<T>(
        &mut self,
        store: &mut Option<T>,
        name: &'static [u8],
        value: &[u8],
    ) -> Result<(), Error>
    where
        T: DeserializeBytes,
    {
        if store.is_some() {
            Err(ErrorKind::DuplicateAttribute(RawByteStr::from(name)))?;
        }

        let value = from_utf8(value)?;
        let value = unescape(value)?;
        let value = T::deserialize_str(self, &value)?;

        *store = Some(value);

        Ok(())
    }

I also updated the doc string to make it clear to the user, so you can use this as drop-in replacement for the PR.
Since we now have a suitable test, can you please revert the changes you've made to the read_write example? Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants