-
Notifications
You must be signed in to change notification settings - Fork 57
Parsing is very slow #25
Description
There are a few reasons why:
Handsoap::XmlQueryFront::NokogiriDriver#to_sis very inefficient
The method uses a lot of literal strings that are constant and do not need to be modified. Using literal strings means they need to be #duped every time they are used in the method. Also, there are several #gsub calls where #gsub! could be called instead.
There is also a note about Nokogiri APIs being unstable, I'm not sure if this is the case anymore, but I overrode this method to just call #content on the backing Nokogiri node. I have something like this as a solution:
diff --git a/lib/handsoap/xml_query_front.rb b/lib/handsoap/xml_query_front.rb
index 3df435c..742d7e1 100644
--- a/lib/handsoap/xml_query_front.rb
+++ b/lib/handsoap/xml_query_front.rb
@@ -168,9 +168,8 @@ module Handsoap
# Returns the underlying native element.
#
# You shouldn't need to use this, since doing so would void portability.
- def native_element
- @element
- end
+ attr_reader :native_element
+
# Returns the node name of the current element.
def node_name
raise NotImplementedError.new
@@ -350,13 +349,34 @@ module Handsoap
element = @element.children.first
end
return if element.nil?
+ string = element.content
+
# This looks messy because it is .. Nokogiri's interface is in a flux
if element.kind_of?(Nokogiri::XML::CDATA)
- element.serialize(:encoding => 'UTF-8').gsub(/^<!\[CDATA\[/, "").gsub(/\]\]>$/, "")
+ stirng.gsub!(EBEGIN_CDATA, BLANK_STRING)
+ string.gsub!(EEND_CDATA, BLANK_STRING)
else
- element.serialize(:encoding => 'UTF-8').gsub('<', '<').gsub('>', '>').gsub('"', '"').gsub(''', "'").gsub('&', '&')
+ string.gsub!(ELT, LT)
+ string.gsub!(EGT, GT)
+ string.gsub!(EQUOT, QUOT)
+ string.gsub!(EAPOS, APOS)
+ string.gsub!(EAMP, AMP)
end
- end
+ string
+ end
+ EBEGIN_CDATA = /^<!\[CDATA\[/
+ EEND_CDATA = /\]\]>$/
+ BLANK_STRING = ''
+ ELT = '<'
+ LT = '<'
+ EGT = '>'
+ GT = '>'
+ EQUOT = '"'
+ QUOT = '"'
+ EAPOS = '''
+ APOS = "'"
+ EAMP = '&'
+ AMP = '&'
end
end
end- All the data transformers use
#to_s
This is expensive since calling #to_s is expensive, but even if #to_s is fixed I do not think the other transformers need to unescape the escape sequences, do they?
I don't really have the time to fix this right now and also make sure I don't break the other drivers. :(
- Using XPath is not very efficient for large data structures
Rewalking the XML subtree is expensive for big data structures. I'm not sure if this is a problem for Handsoap, but maybe a notice in the documentation should be added.
I have worked around all of these issues in a gem that uses handsoap: http://github.com/Marketcircle/jiraSOAP.