well, I don't understand your problem: you're getting the source code of the URL with the "getSourceFromUrl()"-method, that you can find here: http://code.huypv.net/2010/12/j2me-g...ml-source.html
and next you are searching for the part you want to find (check, that it is only available one time in the source code using your PC browser). This means, creating a substring from "begin of needed part" to "end of needed part".
now you need to remove all HTML tags from this code, and here I'm giving you the code I use for this:
(this method is built for up to 50 HTML tags in the source part, it is possible to modify it and use "while (note.indexOf("<") != -1)" instead of "for(...)", then also removing the if and else in the while...)Code://Remove HTML-Tags for (int i = 0; i < 50; i ++) { int beginTag = 0; int endTag = 0; if (note.indexOf("<") != -1) { beginTag = note.indexOf("<"); if (note.indexOf(">") != -1) { endTag = note.indexOf(">"); if (beginTag == 0) note = note.substring(endTag+1,note.length()); else note = note.substring(0,beginTag) + note.substring(endTag+1,note.length()); } } else { i = 50; } }
"note" is the substring you found before... and here you go...![]()
I have my Html String http://pastebin.com/2VY4ZU5C ,from this i want to extract the Description Value in j2me,After Extraction my out put should look like? can any one help me?
OutPut:
President Pranab pay great tributes to Motilal Nehru on occasion of </span>150th birth anniversary. Pranab said institutions evolved by leaders like him should be strengthened instead of being destroyed. <span style="mso-spacerun:yes"> </span>He listed his achievements like his role in evolving of Public Accounts Committee and protecting independence of Legislature from the influence of the Executive by establishing a separate cadre for the Central Legislative Assembly, now Parliament. Calling himself a student of history, he said Motilal's Swaraj Party acted as a disciplined assault force in the Legislative Assembly and he was credited with evolving the system of a Public Accounts Committee which is now one of the most effective watchdogs over executive in matters of money and finance. Mukherjee also received the first set of coins and postal stamps released at the function to commemorate the event.
Have you applied thesuggestion?Code:while (note.indexOf("<") != -1)
Hi wizard and schumi,
thanks for reply,
yes,i have applied the while logic too,i got my description after parsing(the URL http://www.teluguone.com/news/conten...-20-17680.html) and removing html tags like this output http://pastebin.com/TXFyvhZE
Here my Logic which is used to get above Description(output):
String readUrl = ReadUrl.readUrl(URL);
int divIndex = readUrl.indexOf("<div class=\"innercontenttxt\">");
divIndex = readUrl.indexOf(">", divIndex);
int endDivIndex = readUrl.indexOf("</div>", divIndex);
content = readUrl.substring(divIndex + 1, endDivIndex);
//System.out.println("Content" + content);
while (content.indexOf("<") != -1){
int beginTag;
int endTag;
beginTag =content.indexOf("<");
endTag = content.indexOf(">");
if (beginTag == 0) {
content = content.substring(endTag +
1, content.length());
}
else {
content =content.substring(0, beginTag) + content.substring(endTag
+ 1, content.length());
}
}
String description = replace(content, """, "\"");
description = replace(description, " ", "");
description = replace(description, "’", "'");
description = replace(description, "‘", "'");
description = replace(description, "“", "\"");
description = replace(description, "”", "\"");
description = replace(description, "–", "-");
description = replace(description, "&", "&");
System.out.println("Out" + description);
Last edited by pavanragi; 2012-10-05 at 08:02.
Yes, this output is correct for this attempt of removing tags.
The problem what you see comes from the XML comment in the document, the <!-- ... -->. Since this simple mechanism checks for < and > pairs, the extra < in the <!-- causes impairing of everything between, that is why get back those mso-... style tags as 'content'.
A step towards could be removing XML comments first, then the tags.
You can simply re-use the code, just replace < with <!-- and > with -->, and adjust the lengthsBecause of the identical structure of the two loops, you can of course extract the whole thing into a method if you like.Code:while (content.indexOf("<!--") != -1) { int beginTag = content.indexOf("<!--"); int endTag = content.indexOf("-->"); if (beginTag == 0) { content = content.substring(endTag + 3, content.length()); } else { content =content.substring(0, beginTag) + content.substring(endTag + 3, content.length()); } } while (content.indexOf("<") != -1) { int beginTag = content.indexOf("<"); int endTag = content.indexOf(">"); if (beginTag == 0) { content = content.substring(endTag + 1, content.length()); } else { content =content.substring(0, beginTag) + content.substring(endTag + 1, content.length()); } }