I'm needed to interact with an XMLRPC server written using the
xmlrpc-c library for C/C++. I was using Ruby 1.8.4 and found that I
could not get a simple xmlrpc client written in Ruby that would
communicate with the xmlrpc-c server.
I kept getting the following error:
/usr/local/lib/ruby/1.8/xmlrpc/client.rb:547:in `do_rpc':
HTTP-Error: 400 Bad Request (RuntimeError)
from /usr/local/lib/ruby/1.8/xmlrpc/client.rb:420:in `call2'
from /usr/local/lib/ruby/1.8/xmlrpc/client.rb:410:in `call'
from littleclient.rb:7
I tried downgrading to Ruby 1.8.2 and it worked fine.
When I investigated the difference I found the following in the
xmlrpc/client.rb file that comes with Ruby 1.8.4:
def do_rpc(request, async=false)
header = {
"User-Agent" => USER_AGENT,
"Content-Type" => "text/xml; charset=utf-8",
"Content-Length" => request.size.to_s,
"Connection" => (async ? "close" : "keep-alive")
}
This differs from the client.rb included with Ruby 1.8.2:
def do_rpc(request, async=false)
header = {
"User-Agent" => USER_AGENT,
"Content-Type" => "text/xml ",
"Content-Length" => request.size.to_s,
"Connection" => (async ? "close" : "keep-alive")
}
so I changed the code in the 1.8.4 version of client.rb to remove the
"charset=utf-8" - after that the ruby client interacted fine with the
xmlrpc-c server.
I'm wondering if utf-8 should be the default charset for Ruby's xmlrpc
client implementation? Also, I'm wondering if perhaps it could be
selectable by adding an accessor method to the client to the Client
class?
Phil
xmlrpc and charset=utf-8
on 18.04.2006 01:44
Re: xmlrpc and charset=utf-8
on 18.06.2006 05:01
--- Phil Tomson <rubyfan@gmail.com> wrote: > `do_rpc': > I tried downgrading to Ruby 1.8.2 and it worked > "Content-Length" => request.size.to_s, > "Content-Type" => "text/xml ", > > I'm wondering if utf-8 should be the default charset > for Ruby's xmlrpc > client implementation? Also, I'm wondering if > perhaps it could be > selectable by adding an accessor method to the > client to the Client > class? > > Phil Was this ever addressed? I vote for both a default of utf8 and an accessor method. Regards, Dan
Re: xmlrpc and charset=utf-8
on 18.09.2007 03:03
Dominique Brezinski wrote: >> >> > encoding declaration to be presented to the XML processor in an >> I read this to say that XML documents, in the absence of both external > read entities that use them. In the absence of external character > parameter is STRONGLY RECOMMENDED, since this information can be > > It doesn't seem that anything ever became of this. I would like to re-open the topic for discussion with another vote for defaulting the Content-Type header to "text/xml; charset=utf-8" but adding an accessor so this value can be overridden. My specific need comes from trying to interface with weblog software via the MetaWeblog API. Some blog packages incorrectly throw invalid content-type faults because they don't recognize the charset parameter. Currently I have overridden do_rpc to set "Content-Type" => "text/xml" but this seems less than ideal. -Jesse
Re: xmlrpc and charset=utf-8
on 18.09.2007 12:10
At 10:02 07/09/18, jesse_c wrote: >>> > | In this case, MIME and XML processors MUST assume the charset is >>> > | "us-ascii" >>> >>> This is interesting. It seems to be at odds with the XML specification, >>> which >>> says: It seems to be at odd, but it's not. >>> http://www.w3.org/TR/2006/PER-xml-20060614/#charencoding >>> >>> > In the absence of information provided by an external transport >>> protocol The external protocol provides a MIME type of text/xml, which as defined defaults to US-ASCII. Therefore, there is external information. >>> > ordinary ASCII entities do not strictly need an encoding >>> declaration. >>> >>> I read this to say that XML documents, in the absence of both external >>> encoding information or an XML declaration, must be assumed to be UTF-8. >>> RFC3023 appears to be saying that XML documents default to US-ASCII. Yes, if they come served with a MIME type of text/xml (without charset parameter), because that's part of the definition of text/xml. Absence of an explicit "us-ascii" label and absence of information are not the same. That all may sound a bit far-fetched, but that's how things are defined in the specs, sorry. >It doesn't seem that anything ever became of this. I would like to re-open >the topic for discussion with another vote for defaulting the Content-Type >header to "text/xml; charset=utf-8" but adding an accessor so this value can >be overridden. Adding an accessor is definitely a very good idea. Another idea is to change the default to "application/xml". "application/xml" does NOT imply US-ASCII, but (unless it comes with a charset parameter) means 'look at the XML document itself' (which in case of no BOM and no encoding declaration means UTF-8). >http://www.nabble.com/xmlrpc-and-charset%3Dutf-8-tf1465065.html#a12748102 >Sent from the ruby-core mailing list archive at Nabble.com. Oh, great. That one is much easier to use than the 'default' one at blade.nagaokaut.ac.jp. By the way, there is now an official way to include pointers such as the above into a mail header. Please see http://www.ietf.org/internet-drafts/draft-duerst-archived-at-09.txt. Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Re: xmlrpc and charset=utf-8
on 19.06.2006 13:11
On Saturday 17 June 2006 23:00, Daniel Berger wrote: > > I'm wondering if utf-8 should be the default charset > > for Ruby's xmlrpc client implementation? ... > Was this ever addressed? I vote for both a default of > utf8 and an accessor method. Well... FWIW, XML documents are, unless otherwise specified by an XML declaration, UTF8. The HTTP header should reflect the encoding of the payload. -- --- SER "As democracy is perfected, the office of president represents, more and more closely, the inner soul of the people. On some great and glorious day the plain folks of the land will reach their heart's desire at last and the White House will be adorned by a downright moron." - H.L. Mencken (1880 - 1956)
Re: xmlrpc and charset=utf-8
on 19.06.2006 19:38
>>>>> On Sun, 18 Jun 2006 12:00:19 +0900 >>>>> djberg96@yahoo.com(Daniel Berger) said: > > Was this ever addressed? I vote for both a default of > utf8 and an accessor method. http://www.zvon.org/tmRFC/RFC3023/Output/chapter8.html#sub5 | This example shows text/xml with the charset parameter omitted. | In this case, MIME and XML processors MUST assume the charset is "us-ascii" is a reason of charset=utf-8. A reason of no accessor method is encoding conversions depends on platforms. (see ext/iconv/charset_alias.rb)
Re: xmlrpc and charset=utf-8
on 19.06.2006 22:39
I first sent this from the wrong email account, so if that post somehow makes its way onto the list, then please forgive the repitition. On Monday 19 June 2006 13:35, Kazuhiro NISHIYAMA wrote: > > Was this ever addressed? I vote for both a default of > > utf8 and an accessor method. > > http://www.zvon.org/tmRFC/RFC3023/Output/chapter8.html#sub5 > > | This example shows text/xml with the charset parameter omitted. > | In this case, MIME and XML processors MUST assume the charset is > | "us-ascii" This is interesting. It seems to be at odds with the XML specification, which says: http://www.w3.org/TR/2006/PER-xml-20060614/#charencoding > In the absence of information provided by an external transport protocol > (e.g. HTTP or MIME), it is a fatal error for an entity including an > encoding declaration to be presented to the XML processor in an encoding > other than that named in the declaration, or for an entity which begins > with neither a Byte Order Mark nor an encoding declaration to use an > encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, > ordinary ASCII entities do not strictly need an encoding declaration. I read this to say that XML documents, in the absence of both external encoding information or an XML declaration, must be assumed to be UTF-8. RFC3023 appears to be saying that XML documents default to US-ASCII. Now, granted, RFC3023 is a transport protocol, and they're basically saying that if you don't specific the encoding then assume that the content is US-ASCII. However, I find it strange that they specifically require XML processors to assume that unannotated documents are ASCII encoded, which is in opposition to the XML spec. In any case, it appears that the Ruby XML-RPC library is handling data correctly, while the C library is not (since it appears to be ignoring the HTTP header encoding information). --- SER Confidentiality Notice This e-mail (including any attachments) is intended only for the recipients named above. It may contain confidential or privileged information and should not be read, copied or otherwise used by any other person. If you are not a named recipient, please notify the sender of that fact and delete the e-mail from your system.
Re: xmlrpc and charset=utf-8
on 19.06.2006 23:11
On 6/19/06, Sean Russell <ser@germane-software.com> wrote: > > | In this case, MIME and XML processors MUST assume the charset is > > other than that named in the declaration, or for an entity which begins > > with neither a Byte Order Mark nor an encoding declaration to use an > > encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, > > ordinary ASCII entities do not strictly need an encoding declaration. > > I read this to say that XML documents, in the absence of both external > encoding information or an XML declaration, must be assumed to be UTF-8. > RFC3023 appears to be saying that XML documents default to US-ASCII. You are correct in your interpretation of the XML spec, and I agree that mentioned XMLRPC C library appears to be the flawed implementation. The XML spec reads: Although an XML processor is required to read only entities in the UTF-8 and UTF-16 encodings, it is recognized that other encodings are used around the world, and it may be desired for XML processors to read entities that use them. In the absence of external character encoding information (such as MIME headers), parsed entities which are stored in an encoding other than UTF-8 or UTF-16 MUST begin with a text declaration (see 4.3.1 The Text Declaration) containing an encoding declaration.... And RFC 3023 states that charset parameter of the text/xml registration is strongly recommended. The following description of the charset parameter is straight from RFC 3023: Although listed as an optional parameter, the use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the character encoding of the XML MIME entity. The charset parameter can also be used to provide protocol-specific operations, such as charset- based content negotiation in HTTP. "utf-8" [RFC2279] is the recommended value, representing the UTF-8 charset. UTF-8 is supported by all conforming processors of [XML]. Cheers, Dom
Re: xmlrpc and charset=utf-8
on 27.09.2007 03:19
On Tuesday 18 September 2007, Martin Duerst wrote: > >>> I read this to say that XML documents, in the absence of both external > >>> encoding information or an XML declaration, must be assumed to be > >>> UTF-8. RFC3023 appears to be saying that XML documents default to > >>> US-ASCII. > > Yes, if they come served with a MIME type of text/xml (without charset > parameter), because that's part of the definition of text/xml. Absence of > an explicit "us-ascii" label and absence of information are not the same. > That all may sound a bit far-fetched, but that's how things are defined in > the specs, sorry. If the external transport specifies the encoding, then it is up to the code that is processing the transportation to set the encoding of the XML document via the API. The XML parser can't know anything about the transport. That is to say, it is *still* not the parser's responsibility to guess that the encoding is anything other than UTF-8; it must be told otherwise. Put another way, the code accepting the content must tell the parser what encoding the stream is using, if it is using anything other than UTF-8. > Adding an accessor is definitely a very good idea. Another idea is to > change the default to "application/xml". "application/xml" does NOT > imply US-ASCII, but (unless it comes with a charset parameter) means > 'look at the XML document itself' (which in case of no BOM and no > encoding declaration means UTF-8). Another option is to have XMLRPC explicitly set the encoding to whatever the transport says it is. Of course, this would require that XMLRPC parse the first line of the file and make sure that the encoding isn't already specified in the document itself, but that isn't too difficult.
Re: xmlrpc and charset=utf-8
on 27.09.2007 19:04
Sean E. Russell wrote: > On Tuesday 18 September 2007, Martin Duerst wrote: > <snipped the charset and specifications discussion> > specified in the document itself, but that isn't too difficult. > > RFC 3023 states: "If an XML document -- that is, the unprocessed, source XML document -- is readable by casual users, text/xml is preferable to application/xml" and goes on to suggest that user agents which do not support text/xml can display it as text/plain. "readable by casual users" seems a little vague to me but it does seem that this paragraph implies that the choice of MIME type should be based on the structure of the xml document being transported. To me this would support the case for choosing a reasonable default ( perhaps by parsing the xml declaration, checking for a BOM, and falling back to text/xml? ) and then also provide an accessor so the user can choose to override the content-type and charset.
