Discussion Board

Results 1 to 8 of 8
  1. #1
    Registered User impazzito's Avatar
    Join Date
    Apr 2008
    Posts
    28
    Hi to all!

    This is my script.
    I have some problem with encoding the text...

    there are some characters like "è ’ è"

    is there e metod to encote it?

    Thanks

    Code:
    import appuifw
    import globalui
    appuifw.app.orientation='automatic'
    import re
    import urllib
    
    def nascondi():
     try: appuifw.e32.start_exe(u'z:\\sys\\bin\\phone.exe','')
     except: pass
    
    #nascondi()
    
    def ricevi():
     urlWEB = urllib.urlopen
     response = urlWEB('http://oroscopo.libero.it/oro_tab/l.phtml').read()
     try:
      oroscopo = (re.search('<div class="oroscopo_oggi">\x0a(.*?)\x0a</div>', response).group(1))
     except:
      oroscopo = 0
     globalui.global_msg_query(u""+(oroscopo), u"Pesci:")
    
    ricevi()
    appuifw.app.set_exit()

  2. #2
    Nokia Developer Champion marcelobarrosalmeida's Avatar
    Join Date
    Nov 2007
    Location
    Sertaozinho/Brazil
    Posts
    752
    Hello impazzito

    You need to check the encoding your http server is using before converting it to unicode. This issue is a big source of problems and headaches.

    My current function to convert from html to unicode is below (decode_html). It is used in wordmobi.

    Code:
    def safe_unicode(value):
        "http://aspyplayer.googlecode.com/svn/trunk/src/aspyplayer.py"
        if type(value) == type(unicode("unicode")):
            return value
    
        result = ""
        for enc in ['utf8', 'latin1']:
            try:
                result = value.decode(enc)
                break
            except:
                pass
    			
        return unicode(result)
    
    def decode_html(line):
        "http://mail.python.org/pipermail/python-list/2006-April/378536.html"
        pat = re.compile(r'&#(\d+);')
        def sub(mo):
            return unichr(int(mo.group(1)))
        return pat.sub(sub, safe_unicode(line))

  3. #3
    Registered User impazzito's Avatar
    Join Date
    Apr 2008
    Posts
    28
    Thanks for you reply marcelobarrosalmeida!
    in the code of the page, there are character like "& # 2 2 4 ;" (whitout blank spaces)

    It is iso-8859-1 or windows-1252 but i don't know how to view it in my python schell!

    Can you help me? I have tryed to use ".decompile(iso-8859-1)" but don't work!





    Quote Originally Posted by marcelobarrosalmeida View Post
    Hello impazzito

    You need...

  4. #4
    Nokia Developer Champion marcelobarrosalmeida's Avatar
    Join Date
    Nov 2007
    Location
    Sertaozinho/Brazil
    Posts
    752
    Hello

    Have you tried the code ? I mean:

    Code:
    unicode_text = decode_html(text_from_web)
    It will convert the "strange" chars that start with "&#", creating a new unicode string.

  5. #5
    Registered User impazzito's Avatar
    Join Date
    Apr 2008
    Posts
    28
    Quote Originally Posted by marcelobarrosalmeida View Post
    Hello

    Have you tried the code ? I mean:

    Code:
    unicode_text = decode_html(text_from_web)
    It will convert the "strange" chars that start with "&#", creating a new unicode string.
    no.. sorry but i'm very novice in python.. i have to add your code with my code?

  6. #6
    Nokia Developer Champion marcelobarrosalmeida's Avatar
    Join Date
    Nov 2007
    Location
    Sertaozinho/Brazil
    Posts
    752
    Quote Originally Posted by impazzito View Post
    no.. sorry but i'm very novice in python.. i have to add your code with my code?
    Yes ! Just copy the code snippet. Something like (not tested, but should work):

    Code:
    import appuifw
    import globalui
    import re
    import urllib
    
    def safe_unicode(value):
        "http://aspyplayer.googlecode.com/svn/trunk/src/aspyplayer.py"
        if type(value) == type(unicode("unicode")):
            return value
    
        result = ""
        for enc in ['utf8', 'latin1']:
            try:
                result = value.decode(enc)
                break
            except:
                pass
    			
        return unicode(result)
    
    def decode_html(line):
        "http://mail.python.org/pipermail/python-list/2006-April/378536.html"
        pat = re.compile(r'&#(\d+);')
        def sub(mo):
            return unichr(int(mo.group(1)))
        return pat.sub(sub, safe_unicode(line))
    
    def ricevi():
        url = 'http://oroscopo.libero.it/oro_tab/l.phtml'
        response = urllib.urlopen(url).read()
        try:
            oroscopo = (re.search('<div class="oroscopo_oggi">\x0a(.*?)\x0a</div>', response).group(1))
        except:
            oroscopo = u"Impossible to decode"
    
        msg = decode_html(oroscopo) # <<<<<=========== HERE
        globalui.global_msg_query(msg, u"Pesci:")
    
    appuifw.app.orientation='automatic'
    ricevi()
    appuifw.app.set_exit()

  7. #7
    Registered User impazzito's Avatar
    Join Date
    Apr 2008
    Posts
    28
    WOWOWOWOW!!! Work!!!
    May you explain me why i hve to use hxxp://aspyplayer.googlecode.com/svn/trunk/src/aspyplayer.py and hxxp://mail.python.org/pipermail/python-list/2006-April/378536.html in my code? and what will happen if these links will be removed in future?

    Thanksssss!!!




    Quote Originally Posted by marcelobarrosalmeida View Post
    Yes ! Just copy the code snippet. Something like (not tested, but should work):

  8. #8
    Nokia Developer Champion marcelobarrosalmeida's Avatar
    Join Date
    Nov 2007
    Location
    Sertaozinho/Brazil
    Posts
    752
    Quote Originally Posted by impazzito View Post
    WOWOWOWOW!!! Work!!!
    May you explain me why i hve to use hxxp://aspyplayer.googlecode.com/svn/trunk/src/aspyplayer.py and hxxp://mail.python.org/pipermail/python-list/2006-April/378536.html in my code? and what will happen if these links will be removed in future?

    Thanksssss!!!
    Hi,

    comments in python starts with "#" so ... you can remove these lines. I decided to preserve them to give credits to original authors.

Similar Threads

  1. Replies: 10
    Last Post: 2009-04-17, 16:49
  2. xml encoding problem
    By hswlmark in forum Symbian C++
    Replies: 1
    Last Post: 2008-12-31, 08:44
  3. Audio Recording
    By younker in forum Mobile Java Media (Graphics & Sounds)
    Replies: 4
    Last Post: 2007-01-20, 20:02
  4. MMS binary encoding problem
    By edvin in forum General Messaging
    Replies: 1
    Last Post: 2004-12-08, 21:19
  5. Problem Displaying Xhtml Mp Page With Greek Encoding
    By manolis2 in forum Browsing and Mark-ups
    Replies: 1
    Last Post: 2003-04-15, 22:11

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Nokia Developer aims to help you create apps and publish them so you can connect with users around the world.

京ICP备05048969号  © Copyright Nokia 2013 All rights reserved