Author Topic: Newzleech spaces in results poss solution?  (Read 5240 times)

Offline Moose

  • Contributor
  • ***
  • Posts: 10
Newzleech spaces in results poss solution?
« on: October 06, 2008, 12:25:00 AM »
We all know the newzleech thing about their server adding spaces into results. The only time it (for me at least) seems to throw a problem with the title parsing is when it is in the middle of the title name. For example:

[84988]-[#altbin@EFNet]-[FULL]-[Stereophonics-Performance_And_Cocktails-1999-EOS]-[ 17/22] - "Stereophonics-Performance_And_Cocktails-1999-EOS.par2" yE

parses correctly to give the proper title of Stereophonics-Performance_And_Cocktails-1999-EOS. You can see here that the space is inserted by the newzleech servers right before the '17', so alt.binz doesn't have a problem parsing the title correctly.

In this case, however:

[13263]-[#altbin@EFNet]-[FULL]-[The.Sopranos.The.Complete.Third.Season.Volume4.EP13 .DVDrip.XviD-SMB]-[01/35] - "the.sopranos.the.complete.third.season.volume4.ep13.dvdrip.par2" yEnc (159/159)

the title parses incorrectly, because the space has been inserted in the middle of the title, right after 'ep13', breaking the parser so it has to be either reconstructed manually, the whole lot copied, pasted, and tidied up properly, or done through another search engine.

I think I have a solution, though. In each case (and every other case that I've checked), the space is inserted by newzleech's servers after character #83, making the space come up at character #84. Since I've never had alt.binz fail to parse a title incorrectly from anything other than the #altbin@EFNet posts, and these are pretty frequently uploaded, it seems to be something of an isolated issue (every other poster puts bloody spaces in!).

Could alt.binz maybe detect that the post string contains '#altbin@EFNet' and, if it does, delete the space that's at char #84 and then parse after doing that?

I understand this is a long post to explain and suggest a solution for what seems like a rather isolated incident, but it seems to come up more often than I expect, and I wonder if the fix might be as simple as I've suggested.

Thanks again for such an epic program :)

Offline mysteryman

  • Contributor
  • ***
  • Posts: 66
Re: Newzleech spaces in results poss solution?
« Reply #1 on: October 12, 2008, 06:21:54 AM »
Hrmm... Very interesting. I had seen a few times when using newzleech, but I just ignored it. At the time, I figured that it was an obfuscation technique used by the poster. If it is not fixed, at least I learned the cause of this. However, I will agree, it seems to happen often enough to be very tedious. That is, assuming the space isn't being added on altbinz side by mis-interpreting the page... perhaps a <br> or line break in the page being parsed.

Obviously if newzleech is the one doing it, they SHOULD fix it... but I have no idea if they have a 'good' reason for this. I also do not know if they are good about making changes to fix things, especially fix things that get them no advertisement revenue. If they did it to speed up searches, it would be simpler to make the index shorter, rather than the field length... but thats just me.

If they are not willing to fix, I give it a very long winded +1... except, don't search for altbin@efnet ...instead make it generic. First check if we are using a 'broken' search such as newzleech (sounds broken to me); second glob/search for the first occurance of  ' ', if the FIRST occurrence of a space is at char c84 (?) then remove it. I have a feeling this would pull up very few false positives, and any occasional ones can easier remove a few chars rather easily.

I just did a test or two. It is nowhere near exhaustive, you might want to double check with more searches before taking the information as confirmed. Inother words, this is how it happened on two searches, it may change in certian situations. In my search I saw:

newzleech at c77.

binsearch at c71.

yabse at 47 and 126 (?!) *very strange to note... yabse actually substitutes the char for a space, rather than adding a space after it. ie: Retail becomes etail if the r char fell on the column in question. Also, the 126 only happened once (half the test) so it may be occasional. Though, guessing the right character to replace it with is a bit haphazard to begin with. Probably better just to have a warning when you search it.

NLSS does not seem to make spaces, but it has all lowercase letters in its searches... this is very annoying, and lacks 'authenticity' of the original. Ironic, that the one reason I payed for newsleecher is the only thing I ended up disliking the most about it. It might be nice to warn users when searching (with option 'do not show this again') during their first NLSS search.

I wasn't able to get beta.binaries.nl or aeton to work, but I didn't try hard, as I don't use them anyway.

PS. One of the mods will tell you shortly to edit your post. We do not care what you downloaded, only what altbinz does/should do. You should use some filler for the names listed.

Offline therealjoeblow

  • Contributor
  • ***
  • Posts: 84
Re: Newzleech spaces in results poss solution?
« Reply #2 on: October 17, 2008, 04:20:11 PM »
The solution to this problem that I've been using with 100% success for some time now is very simple - shorten your search string, and use only a unique portion of it, not the whole string. 

For the types of posts you're talking about the string "[13263]-[#altbin@EFNet]" will find your files without fail on both Newzleech and Binsearch, whereas the whole string will probably fail in many cases.  That's one of the major benefits of the indexed posting system that the EFNet posters use - the numerical identifier is unique.  You can usually find come up with a few unique words in most things you're looking for rather than writing a novel in the search box, and that generally works much better.

Cheers,
The REAL Joe

Offline mysteryman

  • Contributor
  • ***
  • Posts: 66
Re: Newzleech spaces in results poss solution?
« Reply #3 on: October 17, 2008, 10:22:58 PM »
I think you confused the "problem" with the cause. True, the ACTUAL problem is that there is a space in the results, but there is nothing we can do about that ... So assuming newzleech and the other providers do not fix it themselves... The major problem is that once you DO get results (by searching a small string) altbinz will guess the wrong name for you, and you will have to fix/retype it yourself.

If you reread my post, I suggested to have altbinz have a programmed 'list' of 'broken' search providers and how to 'fix' their results. This could be done to the results, or simply for the name detection. It will not  fix searching for a entire line(no way to do that client side)... but who does that? Though, once you do get results, if the space has been removed or ignored, altbinz will detect the correct folder name.

Offline Hecks

  • Contributor
  • ***
  • Posts: 2011
  • naughty cop
Re: Newzleech spaces in results poss solution?
« Reply #4 on: October 17, 2008, 10:38:24 PM »
Why the confusion between how the results are displayed on the site (with spaces to force wrapping), and how they're stored in the database?  Anyone have any problems with actual search strings?  Examples?


Offline mysteryman

  • Contributor
  • ***
  • Posts: 66
Re: Newzleech spaces in results poss solution?
« Reply #5 on: October 17, 2008, 10:45:27 PM »
I have not tested if the spaces are stored in the database... that was brought up by therealjoeblow. I could test it later, but it is easy enough to test. Our issue (me and OP) is with the parsing of directory name. Since newzleech likely wont fix something for sole use by external programs that bypass their advertisements... We hoped that altbinz could detect the fake space (whether for wordwrap, or for easier (processing) searching) and remove it... either for results+dirname, or for dirname detection only

Offline Hecks

  • Contributor
  • ***
  • Posts: 2011
  • naughty cop
Re: Newzleech spaces in results poss solution?
« Reply #6 on: October 17, 2008, 10:48:29 PM »
Believe it or not, I can actually read.

Offline mysteryman

  • Contributor
  • ***
  • Posts: 66
Re: Newzleech spaces in results poss solution?
« Reply #7 on: October 17, 2008, 11:13:26 PM »
:) sorry... I guess that comes with my job (pc tech) ... its always the other guys fault.  ;D

Offline Jusher

  • Contributor
  • ***
  • Posts: 6
Re: Newzleech spaces in results poss solution?
« Reply #8 on: October 18, 2008, 01:13:53 AM »
The thing seems to be a little more complex than it looks.
As far as I can tell, for Binsearch, that space ALWAYS comes up after the 70th character (making it the 71st character) no matter what how long or short your search string is.
For Newzleech however, there is a difference. I can't find a connection between the search string and the place where the space is inserted. No matter how long or short the search string, you will never get the space completely removed (unless the space would fall so far back, that before it, there already is a "real" space in the header, making the "unreal" space unnecessary). This seems to make it impossible for alt.binz to do something about this problem however. And even if the search string and the space are somehow connected, the work involved with that would be too much. The only solution is to ask the search engines, but I'm pretty sure that either they can't change it, or they don't want to because I'm sure we are not the first people complaining about this.

Good night everyone ;)
Jusher

Offline Hecks

  • Contributor
  • ***
  • Posts: 2011
  • naughty cop
Re: Newzleech spaces in results poss solution?
« Reply #9 on: October 18, 2008, 01:30:32 AM »
There's no connection at all between length of your search string (which reminds me of a joke :)) and the display of the subject line on the sites.  These spaces are inserted only to wrap the columns neatly so the tables don't disappear off the horizon.  Continuous text that's mistaken by your browser to be one long word (as in the examples above) prevents wrapping in tables, so it needs to be broken arbitrarily or dynamically, with php or javascript, at a fixed or dynamic character position.  That's it.

As you've no doubt discovered by now, this doesn't affect searching in the slightest, only the naming of NZBs by Alt.Binz due to the regexes it uses.

Offline mysteryman

  • Contributor
  • ***
  • Posts: 66
Re: Newzleech spaces in results poss solution?
« Reply #10 on: October 18, 2008, 09:40:33 AM »
Hrmm... I might have just gotten it fixed. I reported it on their forum, and they were much more helpful than i would have expected. A few hours after reporting it, the admin came on and said he was going to try to improve it, I did a search recently, and one line that had an issue before was fixed.

It seems to work right now... but it may not be a complete fix. We probably wont see for sure until we get newzleech working again in altbinz, and play with it a bit.

oh, and btw, the reason the length changed depending on search was because the limit was not 70 something. It was 90 characters, including html (&amp; ... etc).

I looked on binsearch site, and I don't see a forum, but the html chars part might be key to decoding the char position for the space on binsearch and others.

Offline Hecks

  • Contributor
  • ***
  • Posts: 2011
  • naughty cop
Re: Newzleech spaces in results poss solution?
« Reply #11 on: October 18, 2008, 09:56:19 AM »
Good job.  8)

Offline mysteryman

  • Contributor
  • ***
  • Posts: 66
Re: Newzleech spaces in results poss solution?
« Reply #12 on: October 18, 2008, 12:20:37 PM »
Unfortunately, it does nothing for binsearch or yabse... but i think binsearch is significantly more popular. Do they have an IRC room, or unofficial forum somewhere?

Offline Jusher

  • Contributor
  • ***
  • Posts: 6
Re: Newzleech spaces in results poss solution?
« Reply #13 on: October 19, 2008, 12:21:03 PM »
Btw. I think the fastest way to fix it manually is to copy the part that alt.binz suggests and then move up/down with the arrow keys to the other part of the release name, paste the original text and remove the front. Or, you could just leave it and once in a while remove all the unneeded altbin@efnet thing with a folder renamer or something :P

Greetz
Jusher

EDIT: For Newzleech, the space really seem to be gone. All spaces that I looked at yesterday are now gone. Wouldn't have thought it'd be so easy.
« Last Edit: October 19, 2008, 12:25:52 PM by Jusher »