Skip to content
This repository was archived by the owner on May 8, 2024. It is now read-only.
This repository was archived by the owner on May 8, 2024. It is now read-only.

Parsing is very slow #25

@ferrous26

Description

@ferrous26

There are a few reasons why:

  1. Handsoap::XmlQueryFront::NokogiriDriver#to_s is very inefficient

The method uses a lot of literal strings that are constant and do not need to be modified. Using literal strings means they need to be #duped every time they are used in the method. Also, there are several #gsub calls where #gsub! could be called instead.

There is also a note about Nokogiri APIs being unstable, I'm not sure if this is the case anymore, but I overrode this method to just call #content on the backing Nokogiri node. I have something like this as a solution:

diff --git a/lib/handsoap/xml_query_front.rb b/lib/handsoap/xml_query_front.rb
index 3df435c..742d7e1 100644
--- a/lib/handsoap/xml_query_front.rb
+++ b/lib/handsoap/xml_query_front.rb
@@ -168,9 +168,8 @@ module Handsoap
       # Returns the underlying native element.
       #
       # You shouldn't need to use this, since doing so would void portability.
-      def native_element
-        @element
-      end
+      attr_reader :native_element
+
       # Returns the node name of the current element.
       def node_name
         raise NotImplementedError.new
@@ -350,13 +349,34 @@ module Handsoap
           element = @element.children.first
         end
         return if element.nil?
+        string = element.content
+
         # This looks messy because it is .. Nokogiri's interface is in a flux
         if element.kind_of?(Nokogiri::XML::CDATA)
-          element.serialize(:encoding => 'UTF-8').gsub(/^<!\[CDATA\[/, "").gsub(/\]\]>$/, "")
+          stirng.gsub!(EBEGIN_CDATA, BLANK_STRING)
+          string.gsub!(EEND_CDATA,   BLANK_STRING)
         else
-          element.serialize(:encoding => 'UTF-8').gsub('&lt;', '<').gsub('&gt;', '>').gsub('&quot;', '"').gsub('&apos;', "'").gsub('&amp;', '&')
+          string.gsub!(ELT,   LT)
+          string.gsub!(EGT,   GT)
+          string.gsub!(EQUOT, QUOT)
+          string.gsub!(EAPOS, APOS)
+          string.gsub!(EAMP,  AMP)
         end
-      end
+        string
+      end
+      EBEGIN_CDATA = /^<!\[CDATA\[/
+      EEND_CDATA   = /\]\]>$/
+      BLANK_STRING = ''
+      ELT          = '&lt;'
+      LT           = '<'
+      EGT          = '&gt;'
+      GT           = '>'
+      EQUOT        = '&quot;'
+      QUOT         = '"'
+      EAPOS        = '&apos;'
+      APOS         = "'"
+      EAMP         = '&amp;'
+      AMP          = '&'
     end
   end
 end
  1. All the data transformers use #to_s

This is expensive since calling #to_s is expensive, but even if #to_s is fixed I do not think the other transformers need to unescape the escape sequences, do they?

I don't really have the time to fix this right now and also make sure I don't break the other drivers. :(

  1. Using XPath is not very efficient for large data structures

Rewalking the XML subtree is expensive for big data structures. I'm not sure if this is a problem for Handsoap, but maybe a notice in the documentation should be added.

I have worked around all of these issues in a gem that uses handsoap: http://github.com/Marketcircle/jiraSOAP.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions