Server Help Forum Index Server Help
Community forums for Subgame, ASSS, and bots
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   StatisticsStatistics   RegisterRegister 
 ProfileProfile   Login to check your private messagesLogin to check your private messages   LoginLogin (SSL) 

Server Help | ASSS Wiki (0) | Shanky.com
My CAPTCHA image decoding quest!
Goto page 1, 2  Next
 
Post new topic   Reply to topic Printable version
 View previous topic  (c++) Pointer-to-member function store... Post :: Post Listview Problem!!?  View next topic  
Author Message
Solo Ace
Yeah, I'm in touch with reality...we correspond from time to time.


Age:30
Gender:Gender:Male
Joined: Feb 06 2004
Posts: 2583
Location: The Netherlands
Offline

PostPosted: Mon Jun 19, 2006 3:45 am    Post subject: My CAPTCHA image decoding quest! Reply to topic Reply with quote

I think it's interesting how some software can determine what the digits are in CAPTCHA images.
I started working on decoding some easy CAPTCHAs too.

My school's intranet site's security was "increased" by another student from another part of the country (uhm, why is it an intranet site when it allows external logins? Isn't that an extranet?)
He's even getting paid for it.
His PHP code is generating this poor captcha image:



Before this I've never done any image programming, except for a little GD2 with PHP, and I'm not going to read any manuals/tutorials, I'm fine on my own it seems.

I'm just reading the image and doing a (pseudocode):
Code: Show/Hide
if ((R == 255 && G == 204 && B == 255) || // background
   (R == 69 && G == 58 && B == 40) || //lines
   (R < 250 && G > 10 && B > 10))
   R =  G = B = 255


What we have left is easily readable by an OCR application, like GOCR for example.



I convert the bitmap to a PPM (otherwise GOCR will barf), run "gocr.exe intranet2.ppm > intranet2.txt" and find "JeDADck" in intranet2.txt.

I think it took me like an hour (yeah I know it's long) to figure everything out, but I think it's fun. icon_smile.gif
Back to top
View users profile Send private message Add User to Ignore List
Solo Ace
Yeah, I'm in touch with reality...we correspond from time to time.


Age:30
Gender:Gender:Male
Joined: Feb 06 2004
Posts: 2583
Location: The Netherlands
Offline

PostPosted: Mon Jun 19, 2006 4:09 am    Post subject: Reply to topic Reply with quote

Mine GO BOOM wrote:
or just get a good programmer. PWNtcha is very private, but works great. Give it a shot, download a captcha picture from your registration and submit it there. They say it is 97% for phpbb.


Well, I'm not a "good programmer", I'm not even a programmer, but a belgian guy at my school asked me to look at the CAPTCHA images for recruiting people, or something, on the web-based game www.kingsofchaos.com.

I have no idea what the game is about really, but I gave the CAPTCHA of Age 5 a shot.





Well, I had no idea what to do, so I just removed the lines.





Still not very clean, but that was easy to solve.





And that's how far I got for "Age 5". The guy from my school told me I should check for the new age, because decoding these CAPTCHAs was useless, so I stopped here.

I think the reason I'm keeping the images as bitmaps is obvious.




koc_age5_ten3.png - 0.49 KB
File downloaded or viewed 16 time(s)

koc_age5_ten2.png - 17.06 KB
File downloaded or viewed 13 time(s)

koc_age5_ten1.png - 21.53 KB
File downloaded or viewed 13 time(s)

koc_age5_fourteen3.png - 0.8 KB
File downloaded or viewed 19 time(s)

koc_age5_fourteen2.png - 14.71 KB
File downloaded or viewed 16 time(s)

koc_age5_fourteen1.png - 17.09 KB
File downloaded or viewed 20 time(s)
Back to top
View users profile Send private message Add User to Ignore List
SpecShip
Complete twat


Gender:Gender:Male
Joined: Dec 17 2005
Posts: 514
Location: 8025 - Spec Freq
Offline

PostPosted: Mon Jun 19, 2006 4:38 am    Post subject: Reply to topic Reply with quote

You enjoy terrorizing Cerium (dailup user)?

I don't care what's your excuse, posting bmps on forums should get you hanged unless it's a 3kb file sized 30x30 or something...
Or a SS banner.


ROFL @ kings game...browser games are for gay handicapped people.
_________________
Replacing yazour untill the whore returns.

"I could run a ss server on my car stereo!" -Xalimar
"Liberta tuit ma ex infernis" -Event Horizon
"I know too much about nothing." - Mine GO BOOM
"Hmm anyway, back to my kingdom hearts." - Chambahs
Back to top
View users profile Send private message Add User to Ignore List
Solo Ace
Yeah, I'm in touch with reality...we correspond from time to time.


Age:30
Gender:Gender:Male
Joined: Feb 06 2004
Posts: 2583
Location: The Netherlands
Offline

PostPosted: Mon Jun 19, 2006 4:51 am    Post subject: Reply to topic Reply with quote

Uh, if I'd compress the data the results of what I've done to the images would be a little vague, so I kept the original images.
There's no point in trying to make a car faster/leighter by removing the engine or whatever keeps the car going.

I'm working on Age 6 now, which seems to be even easier.
kingsofchaos.com wrote:
Updated CAPTCHAs to deter automatic scripts


Yay?

I'm sorry Cerium, just ignore this thread. sa_tongue.gif
Back to top
View users profile Send private message Add User to Ignore List
Mine GO BOOM
Hunch Hunch
What What
Hunch Hunch<br>What What


Age:34
Gender:Gender:Male
Joined: Aug 01 2002
Posts: 3603
Location: Las Vegas
Offline

PostPosted: Mon Jun 19, 2006 4:52 am    Post subject: Reply to topic Reply with quote

Solo Ace wrote:
I think the reason I'm keeping the images as bitmaps is obvious.

Unless you are actually using that server to run gocr, and even then, I'd still recommend you use another image format, such as png for your posts.

BMP: 1,620,324 bytes
PSP's PNG batch conversion: 165,518 bytes
PNGcrush: 117,265 bytes

So a simple batch conversion will cut it down to 1/10th the size, while using pngcrush will knock it down even smaller.

Converting to grey scale, as that is all the images really are, and you get down to under 5% of the original BMP size. And you know what, every single image is still pixel-for-pixel identical to the original BMP files.
PNGcrush: 73,392 bytes

EDIT: Screw it, I replaced the images because those with dialup would wait too damn long for nothing.
Back to top
View users profile Send private message Add User to Ignore List Send email Visit posters website
Smong
Server Help Squatter


Joined: 1043048991
Posts: 0x91E
Offline

PostPosted: Mon Jun 19, 2006 7:53 am    Post subject: Reply to topic Reply with quote

What is Age 5/6? Is that a game or a captcha system? Also with ten/fourteen you didn't explain how you removed the lines even though they are the same color as the text.
Back to top
View users profile Send private message Add User to Ignore List Visit posters website MSN Messenger
Solo Ace
Yeah, I'm in touch with reality...we correspond from time to time.


Age:30
Gender:Gender:Male
Joined: Feb 06 2004
Posts: 2583
Location: The Netherlands
Offline

PostPosted: Mon Jun 19, 2006 10:12 am    Post subject: Reply to topic Reply with quote

Fine MGB, I realized they'd still be identical, but whatever, I still wanted to keep what I worked with.

I don't really know what the age 5/6 are.
Uh it's just... a time, I guess.

Quote:
Play is segmented into ages, which do not have a set length but so far have lasted for about six months. Age 1 launched in early January of 2003, and Age 2 launched in late August the same year. Age 2 ended February 2004, and was followed by a public beta of Age 3. The full version of Age 3 was launched on July 15, and Age 4 began on February 21, 2005. The fifth age was launched on October 9, 2005. Age 6 began on May 14, 2006.


The game changes when the Age changes, and they 'upgraded' the captcha system for Age 5 to 6 it seems.

I didn't really use magic to remove the lines, the lines simply are 1 or 2 pixels wide. I ran across the image scanning for the lines, and removed them if they matched the "1 or 2 pixels wide" criteria.

I'm too embarassed to show code, it's awful. sa_tongue.gif
Back to top
View users profile Send private message Add User to Ignore List
Bak
?ls -s
0 in


Age:18
Gender:Gender:Male
Joined: Jun 11 2004
Posts: 1826
Location: USA
Offline

PostPosted: Mon Jun 19, 2006 7:14 pm    Post subject: Reply to topic Reply with quote

looks like you could attempt to make the smallest rectanagle possible around the text then rotate it so the the rectangle is parallel the the bottom of the image... then try the OCR. Also looks like they're using dashes you might want to scan for those
_________________
SubSpace Discretion: A Third Generation SubSpace Client
Back to top
View users profile Send private message Add User to Ignore List AIM Address
Solo Ace
Yeah, I'm in touch with reality...we correspond from time to time.


Age:30
Gender:Gender:Male
Joined: Feb 06 2004
Posts: 2583
Location: The Netherlands
Offline

PostPosted: Tue Jun 20, 2006 1:54 am    Post subject: Reply to topic Reply with quote

That's exactly what I wanted to do, but it'd be useless now. icon_sad.gif

I'm working on the "Age 6" CAPTCHA, which looks like this:


I have several ideas for this one already, I think I'm pretty close to filling the digits completely and I can easily get rid of the pixels unrelated to the digits.
Back to top
View users profile Send private message Add User to Ignore List
Quan Chi2
Member of "Sexy Teenagers that Code" Group
Member of


Age:27
Gender:Gender:Male
Joined: Mar 25 2005
Posts: 860
Location: NYC
Offline

PostPosted: Tue Jun 20, 2006 1:59 pm    Post subject: Reply to topic Reply with quote

Wow, thanks Solo. I always wondered what to search for to find out what those images were and how they are generated. Im reading some articles on google about it now.
Back to top
View users profile Send private message Add User to Ignore List Send email Visit posters website AIM Address Yahoo Messenger MSN Messenger
Solo Ace
Yeah, I'm in touch with reality...we correspond from time to time.


Age:30
Gender:Gender:Male
Joined: Feb 06 2004
Posts: 2583
Location: The Netherlands
Offline

PostPosted: Tue Jun 20, 2006 2:40 pm    Post subject: Reply to topic Reply with quote

A CAPTCHA (an acronym for "completely automated public Turing test to tell computers and humans apart", trademarked by Carnegie Mellon University) is a type of challenge-response test used in computing to determine whether or not the user is human.

Well, I just want to be able to break some, I won't abuse it when I can.
I'm about to show my school's principal a brute-force of his password for his account, though, but he challenged me. icon_smile.gif
Back to top
View users profile Send private message Add User to Ignore List
Quan Chi2
Member of "Sexy Teenagers that Code" Group
Member of


Age:27
Gender:Gender:Male
Joined: Mar 25 2005
Posts: 860
Location: NYC
Offline

PostPosted: Tue Jun 20, 2006 8:29 pm    Post subject: Reply to topic Reply with quote

Don't do that. I told the technology admin that the security was weak - both firewalls and filters. He banned me from the school network the second the filters shut down without a trace.. -.-

So I wouldn't open my mouth unless it was urgent. Just act like you don't know how to do it when you confront your principal. lol
Back to top
View users profile Send private message Add User to Ignore List Send email Visit posters website AIM Address Yahoo Messenger MSN Messenger
SpecShip
Complete twat


Gender:Gender:Male
Joined: Dec 17 2005
Posts: 514
Location: 8025 - Spec Freq
Offline

PostPosted: Wed Jun 21, 2006 3:45 am    Post subject: Reply to topic Reply with quote

Well, I have no idea how to put it code-wise, but logic wise, it would seem to me that the key is to query for distance between the dots then lock down on clusters whereas the density is high.
Back to top
View users profile Send private message Add User to Ignore List
Quan Chi2
Member of "Sexy Teenagers that Code" Group
Member of


Age:27
Gender:Gender:Male
Joined: Mar 25 2005
Posts: 860
Location: NYC
Offline

PostPosted: Wed Jun 21, 2006 11:01 am    Post subject: Reply to topic Reply with quote

What?
Back to top
View users profile Send private message Add User to Ignore List Send email Visit posters website AIM Address Yahoo Messenger MSN Messenger
SpecShip
Complete twat


Gender:Gender:Male
Joined: Dec 17 2005
Posts: 514
Location: 8025 - Spec Freq
Offline

PostPosted: Wed Jun 21, 2006 11:18 am    Post subject: Reply to topic Reply with quote

My apologies Quan.
My messages carry a hidden requirement of ReaderIQ>100 inorder to understand.
Back to top
View users profile Send private message Add User to Ignore List
Mine GO BOOM
Hunch Hunch
What What
Hunch Hunch<br>What What


Age:34
Gender:Gender:Male
Joined: Aug 01 2002
Posts: 3603
Location: Las Vegas
Offline

PostPosted: Wed Jun 21, 2006 1:24 pm    Post subject: Reply to topic Reply with quote

SpecShip wrote:
Well, I have no idea how to put it code-wise, but logic wise, it would seem to me that the key is to query for distance between the dots then lock down on clusters whereas the density is high.

Mean like this? All done with distance range of 10.

Threshold 5:


Threshold 7:


Threshold 10:




captcha-010-010.png - 1.51 KB
File downloaded or viewed 18 time(s)

captcha-010-007.png - 1.75 KB
File downloaded or viewed 18 time(s)

captcha-010-005.png - 2.42 KB
File downloaded or viewed 18 time(s)
Back to top
View users profile Send private message Add User to Ignore List Send email Visit posters website
Mine GO BOOM
Hunch Hunch
What What
Hunch Hunch<br>What What


Age:34
Gender:Gender:Male
Joined: Aug 01 2002
Posts: 3603
Location: Las Vegas
Offline

PostPosted: Wed Jun 21, 2006 1:26 pm    Post subject: Reply to topic Reply with quote

And so people actually see it, here is the source code/executable for the program that did the above. Either run captcha.exe, or pass it parameters in the format of <imagename> <distance> <threshold>

Captcha.zip

Uses Corona as an image loading/saving format, because its my favorite for simple pixel control and loads a ton of different image formats.




Captcha.zip - 141.07 KB
File downloaded or viewed 431 time(s)
Back to top
View users profile Send private message Add User to Ignore List Send email Visit posters website
Quan Chi2
Member of "Sexy Teenagers that Code" Group
Member of


Age:27
Gender:Gender:Male
Joined: Mar 25 2005
Posts: 860
Location: NYC
Offline

PostPosted: Wed Jun 21, 2006 1:47 pm    Post subject: Reply to topic Reply with quote

Amazing.. lol
Back to top
View users profile Send private message Add User to Ignore List Send email Visit posters website AIM Address Yahoo Messenger MSN Messenger
Solo Ace
Yeah, I'm in touch with reality...we correspond from time to time.


Age:30
Gender:Gender:Male
Joined: Feb 06 2004
Posts: 2583
Location: The Netherlands
Offline

PostPosted: Thu Jun 22, 2006 3:40 pm    Post subject: Reply to topic Reply with quote

Nice MGB. I'm stuck now haha.

I've tried some things, but I'm an idiot, it didn't work well. Well, not "well", it didn't work at all.
My best attempt even killed GOCR. icon_sad.gif
Back to top
View users profile Send private message Add User to Ignore List
Bak
?ls -s
0 in


Age:18
Gender:Gender:Male
Joined: Jun 11 2004
Posts: 1826
Location: USA
Offline

PostPosted: Thu Jun 22, 2006 8:15 pm    Post subject: Reply to topic Reply with quote

shurnk it down to 25% size (interpolate each point from every four), filled completely black with yellow, then expanded it back 400% to normal size... looks like that will isolate the numbers:

Back to top
View users profile Send private message Add User to Ignore List AIM Address
Mine GO BOOM
Hunch Hunch
What What
Hunch Hunch<br>What What


Age:34
Gender:Gender:Male
Joined: Aug 01 2002
Posts: 3603
Location: Las Vegas
Offline

PostPosted: Thu Jun 22, 2006 10:49 pm    Post subject: Reply to topic Reply with quote

A simple fix. Right now, the program only will check pixels that are already active. Just remove if (bitmap[GetXY(col, row, cols)]) statement, and change the color outputs so they are inverted. The if(count) goes to black (0) now, while the else goes to white (255).

Running with the radius of 15 and threshold of 15 comes out with:


Run that through gocr -v 7 -c _ -C "1234567890" captcha-015-015.png
Code: Show/Hide
# Optical Character Recognition --- gocr 0.40
# options are: -l 0 -s 0 -v 7 -c _ -m 0 -d -1 -n 0 captcha-015-015.png
# using unicode
# popen( pngtopnm captcha-015-015.png )
# PNM P6 x=360 y=220 c=255 head=-1
# db_path= (null)
# OTSU: thresholdValue = 0 gmin=0 gmax=255
# scanning boxes 6
# auto dust size = 2 (mX=28,mY=37)
# remove dust of size  2 histo=3,0(?=0),0(?=0),...   3 cluster removed
#   4 white pixels removed, cs=160
# smooth big chars 7x16 cs=160 ... 122 changes in 3 of 3
# detect barcode , 0 bars, boxes-0=3
# detect pictures, frames, noAlphas, mXmY= 55 73 ...  0 - boxes 3
# averages: mXmY= 55 73 nC= 3 n= 3
# remove boxes on border pictures= 0  rest= 3  boxes?= 3
deleted= 0, within pictures  pictures= 0  rest= 3  boxes?= 3
. deleted= 0, pictures= 0  rest= 3  boxes?= 3
# rotation angle (x,y,num) (100352,5120,1) (0,0,0), pass 1
# rotation angle (x,y,num) (100352,5120,1) (0,0,0), pass 2
# detect longest line - at y=0 crosses=  0 my=0 - at crosses=  0 dy=0
# scanning lines # trouble on line 1:
#  bounds: m1=  46 m2=  11 m3= 104 m4= 104  my=  73
#  counts: i1=   1 i2=   1 i3=   0 i4=   2
#  all boxes of same high!
- lines= 1
# add line infos to boxes ... done
# divide vertical glued boxes, numC 3
# searching melted serifs ...   0 cluster corrected, 0 new boxes
# glue broken chars ...   0 times glued, remaining boxes 3
# detect dust2, ...    1 +   0 boxes deleted, numC= 2
# check for word pitch ... min=28 max=28 pitch_p=29
#  ...  no spaces found
# step 1: char recognition unknown= 2 picts= 0 boxes= 2, 1 of 2 chars unidentified
# debug: unknown= 1 picts= 0 boxes= 2
# step 2: try to compare unknown with known chars - found 0
# step 3: try to divide unknown chars, numC 2

# list shape   0 x=  80   46 d= 68 105 h=1 o=1 dots=0 e000 (?)
# list box dots=0 c=(?) ac=(?) mod=(0x00) line=1 m= 0 10 104 114 r= 35 0
# list pattern   x=  80   46 d= 68 105 t=1 3
...................................@@@..............................<-
...............................@@@@@@@@@.@@@@.......................
.............................@@@@@@@@@@@@@@@@.......................
.........................@@@@@@@@@@@@@@@@@@@@@@@@@..................
......@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.................
....@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@................
....@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.................
...@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@................
...@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.................
....@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@..................
.....@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@...................
...........@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@..................
...................@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@..................
....................@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.................
.......................@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@...............
.......................@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@...............
.....................@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@...............
......................@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@................
......................@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.................
.....................@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.................
.....................@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.................
......................@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.................
......................@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@................
.........................@@@@@@@@@@@@@@@@@@@@@@@@@@.................
........................@@@@@@@@@@@@@@@@@@@@@@@@@@@.................
......................@@@@@@@@@@@@@@@@@@@@@@@..@@...................
.......................@@@@@@@@@@@@@@@@@@@@@@.......................
..................@@@@@@@@@@@@@@@@@@@@@@@@@@@.......................
......@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.........@......
......@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@....
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@..
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
..@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@..
....@@@......@@@..............@@@@@@@@@@@@@@..........@@@@@@@@@.....
# set space width to 29
# insert space between words (dy=115) ... found 0
# step 4: context correction Il1 0O
# store boxtree to lines ...get_least_line_indent: page_width 360, dy 0
Line 1,  y 46, raw indent 80, adjusted indent 80
Minimum indent is 80
... 1 lines, boxes= 1, chars= 1
# debug: (_)= 1 picts= 0 chars= 1 (4)=1
Elapsed time: 0:01:19.787.

Well, has a problem with the one, not surprising. Reduce the image by a forth, get the following:

gocr -v 7 -c _ -C "1234567890" captcha-015-015-sm.png
Code: Show/Hide
# Optical Character Recognition --- gocr 0.40
# options are: -l 0 -s 0 -v 7 -c _ -m 0 -d -1 -n 0 captcha-015-015-sm.png
# using unicode
# popen( pngtopnm captcha-015-015-sm.png )
# PNM P6 x=90 y=55 c=255 head=-1
# db_path= (null)
# OTSU: thresholdValue = 0 gmin=0 gmax=255
# scanning boxes 3
# auto dust size = 1 (mX=13,mY=18)
# remove dust of size  1 histo=0,0(?=0),0(?=0),...   0 cluster removed
#   0 white pixels removed, cs=160
# smooth big chars 7x16 cs=160 ...  16 changes in 2 of 3
# detect barcode , 0 bars, boxes-0=3
# detect pictures, frames, noAlphas, mXmY= 13 18 ...  0 - boxes 3
# averages: mXmY= 13 18 nC= 3 n= 3
# remove boxes on border pictures= 0  rest= 3  boxes?= 3
deleted= 0, within pictures  pictures= 0  rest= 3  boxes?= 3
. deleted= 0, pictures= 0  rest= 3  boxes?= 3
# rotation angle (x,y,num) (25600,1024,1) (0,0,0), pass 1
# rotation angle (x,y,num) (25600,1024,1) (0,0,0), pass 2
# detect longest line - at y=0 crosses=  0 my=0 - at crosses=  0 dy=0
# scanning lines # trouble on line 1:
#  bounds: m1=  12 m2=   2 m3=  24 m4=  24  my=  18
#  counts: i1=   1 i2=   1 i3=   0 i4=   2
#  all boxes of same high!
- lines= 1
# add line infos to boxes ... done
# divide vertical glued boxes, numC 3
# searching melted serifs ...   0 cluster corrected, 0 new boxes
# glue broken chars ...   0 times glued, remaining boxes 3
# detect dust2, ...    1 +   0 boxes deleted, numC= 2
# check for word pitch ... min=9 max=9 pitch_p=10
#  ...  min=25 max=25 v=0.000000 mono=1 pitch_m=25
# step 1: char recognition unknown= 2 picts= 0 boxes= 2, 0 of 2 chars unidentified
# debug: unknown= 0 picts= 0 boxes= 2
# step 2: try to compare unknown with known chars - found 0
# step 3: try to divide unknown chars, numC 2
# set space width to 25
# insert space between words (dy=28) ... found 0
# step 4: context correction Il1 0O
# store boxtree to lines ...get_least_line_indent: page_width 90, dy 0
Line 1,  y 12, raw indent 20, adjusted indent 20
Minimum indent is 20
... 1 lines, boxes= 2, chars= 2
# debug: (_)= 0 picts= 0 chars= 2 (1)=1 (4)=1
Elapsed time: 0:00:42.897.

Returns 14. Image defeated. Now try it with a couple more sample images, and maybe even add the reducing to the code itself to make the image smaller.




captcha-015-015-sm.png - 0.45 KB
File downloaded or viewed 21 time(s)

captcha-015-015.png - 1.78 KB
File downloaded or viewed 22 time(s)
Back to top
View users profile Send private message Add User to Ignore List Send email Visit posters website
Solo Ace
Yeah, I'm in touch with reality...we correspond from time to time.


Age:30
Gender:Gender:Male
Joined: Feb 06 2004
Posts: 2583
Location: The Netherlands
Offline

PostPosted: Fri Jun 23, 2006 3:13 am    Post subject: Reply to topic Reply with quote

Bah, I feel dumb now.
Maybe programming isn't meant for me, or maybe I just need more practice or intelligence. sa_tongue.gif

Have you compiled the GOCR yourself?

gocr -v 7 -c _ -C "1234567890ero" captcha-017-017.png gave me:
Code: Show/Hide
# Optical Character Recognition --- gocr 0.40
# options are: -l 0 -s 0 -v 7 -c _ -m 0 -d -1 -n 0 captcha-017-017.png
# popen( pngtopnm captcha-017-017.png )

ERROR src\pnm.c L208: sorry, compile with HAVE_POPEN to use pipes


Note: I'm using 'ero' in "1234567890ero" because I've seen an image containing "err" once, guess that's a good reason to check for it.

I'm using
Code: Show/Hide
FIBITMAP *image = FreeImage_Load(FIF_PNG, filename, 0);
image = FreeImage_Rescale(image, cols / 4, rows / 4, FILTER_BICUBIC);
FreeImage_Save(FIF_BMP, image, "temp.bmp", 0); // this one's just so I can easily SEE the resized image :P
FreeImage_Save(FIF_PPM, image, "temp.ppm", 0);
FreeImage_Unload(image);


For some reason I only get the correct result '14' when using distance and threshold 13. Otherwise I'm getting _4 or 29.
Could this be because it's saving it as ppm?
I chose the bicubic filter for rescaling it, it's a lot smoother than bilinear, but I don't know if that really is an advantage here.

This is the code I have now.

Code: Show/Hide
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "corona.h"
#include "FreeImage.h"

#define IMAGE_FORMAT corona::PF_R8G8B8 /* RGB mode - 8 bits each */
#define GetXY(x,y, w)  ((x) + ((w) * (y)))
#define MIN(a, b) ((a) < (b) ? (a) : (b))
#define MAX(a, b) ((a) > (b) ? (a) : (b))

#define SQ(a) ((a) * (a))
#define DISTANCE(a, b, c, d) (SQ(a - c) + SQ(b - d))

void savepixels(const char *filename, int width, int height, unsigned char *bitmap)
{
   unsigned char *pixels = new unsigned char[width * height * 3];
   unsigned char *p = pixels, *b = bitmap;
   int col, row;

   for (row = 0; row < height; row++)
   for (col = 0; col < width; col++)
   {
      *p++ = *b;
      *p++ = *b;
      *p++ = *b++;
   }

   corona::Image *img = corona::CreateImage(width, height, IMAGE_FORMAT, pixels);
   corona::SaveImage(filename, corona::FF_AUTODETECT, img);
   delete img;
}

int main(int argc, char **argv)
{
   corona::Image *img;
   unsigned char *pixels;
   char filename[255], *nl;
   int threshold = 0;
   int distance = 0;
   int pixel;
   char str[255];
   int rows, cols;
   int row, col;
   unsigned char *bitmap, *p;
   unsigned char *outmap;
   FIBITMAP *image;

   if (argc > 1)
   {
      strncpy(filename, argv[1], sizeof(filename) - 13);
      filename[sizeof(filename)-13] = 0;

      if (!strcmpi(filename, "/?") || !strcmpi(filename, "-h") || !strcmpi(filename, "--help"))
      {
         printf("captcha <filename> <distance> <threshold>\n");
         return 0;
      }
      
      if (argc > 3)
      {
         distance = atoi(argv[2]);
         threshold = atoi(argv[3]);
      }
   }
   else
   {
      printf("Enter captcha filename: ");
      fgets(filename, sizeof(filename) - 13, stdin);

      nl = strchr(filename, '\n');
      if (nl)
         *nl = 0;
   }

   img = corona::OpenImage(filename, IMAGE_FORMAT);

   if (!img)
   {
      printf("Could not open file %s\n", filename);
      return 1;
   }

   pixels = (unsigned char*)img->getPixels();
   rows = img->getHeight();
   cols = img->getWidth();
   bitmap = new unsigned char[rows * cols];
   p = bitmap;
   outmap = new unsigned char[rows * cols];

   //convert to grayscale of a single byte
   for (row = 0; row < rows; row++)
   for (col = 0; col < cols; col++)
   {
      pixel = *pixels++;
      pixel += *pixels++;
      pixel += *pixels++;

      *p++ = pixel / 3;
   }

   //free corona loading
   delete img;

   while (distance <= 0 || distance >= 1000)
   {
      printf("Enter distance to check: ");
      fgets(str, sizeof(str) - 1, stdin);
      distance = atoi(str);
   }

   while (threshold <= 0 || threshold >= 255)
   {
      printf("Enter threshold: ");
      fgets(str, sizeof(str) - 1, stdin);
      threshold = atoi(str);
   }

   //check our threshold
   for (row = 0; row < rows; row++)
   for (col = 0; col < cols; col++)
   {
//      if (bitmap[GetXY(col, row, cols)])
      {
         int count = 0;
         int x, y;
         int dhalf = distance / 2 + 1;

         //could optimize here heavily, by only checking inside a circle rather than square + distance
         for (x = MAX(col - dhalf, 0); x < MIN(col + dhalf, cols); x++)
         for (y = MAX(row - dhalf, 0); y < MIN(row + dhalf, rows); y++)
         {
            if (SQ(distance) > DISTANCE(col, row, x, y) && bitmap[GetXY(x, y, cols)])
               count++;
         }

         if (count >= threshold)
            outmap[GetXY(col, row, cols)] = 0;
         else
            outmap[GetXY(col, row, cols)] = 255;
      }
//      else
//         outmap[GetXY(col, row, cols)] = 0;
   }

   //save output
   nl = strrchr(filename, '.');
   if (!nl)
      nl = &filename[strlen(filename) - 1];
   sprintf(nl, "-%03d-%03d.png", distance, threshold);

   savepixels(filename, cols, rows, outmap);

   image = FreeImage_Load(FIF_PNG, filename, 0);
   image = FreeImage_Rescale(image, cols / 4, rows / 4, FILTER_BICUBIC);
   FreeImage_Save(FIF_BMP, image, "temp.bmp", 0); // this one's just so I can easily SEE the image :P
   FreeImage_Save(FIF_PPM, image, "temp.ppm", 0);
   FreeImage_Unload(image);

   system("gocr -v 7 -c _ -C \"1234567890ero\" temp.ppm > result.txt 2>&1");

   delete bitmap;
   delete outmap;

   return 0;
}


I'll do a few test runs on other images and post the results.

Oh and as you can see I'm using the FreeImage library, I don't know if Corona is capable of what I used FreeImage for, but whatever, that's not a problem for now.
Back to top
View users profile Send private message Add User to Ignore List
SpecShip
Complete twat


Gender:Gender:Male
Joined: Dec 17 2005
Posts: 514
Location: 8025 - Spec Freq
Offline

PostPosted: Fri Jun 23, 2006 5:59 am    Post subject: Reply to topic Reply with quote

Must restrain myself...must resist urge...
Back to top
View users profile Send private message Add User to Ignore List
Mine GO BOOM
Hunch Hunch
What What
Hunch Hunch<br>What What


Age:34
Gender:Gender:Male
Joined: Aug 01 2002
Posts: 3603
Location: Las Vegas
Offline

PostPosted: Fri Jun 23, 2006 12:21 pm    Post subject: Reply to topic Reply with quote

Solo Ace wrote:
Bah, I feel dumb now.
Maybe programming isn't meant for me, or maybe I just need more practice or intelligence. :p

Have you compiled the GOCR yourself?

[...]

Note: I'm using 'ero' in "1234567890ero" because I've seen an image containing "err" once, guess that's a good reason to check for it.

I have GOCR compiled on Gentoo by whatever the defaults for it is. It uses GTK to load image formats. The default compiled one for Windows doesn't contain this, thus its image restriction. Cygwin's GOCR loads image formats fine.

If thats the case, don't use the -C flag. I only did that because I thought it was numbers only (should have used -n 1 flag then). Just let it attempt to guess any valid character instead.

Try your current method with a bunch more images. Once you get stuck on, post them here, either as a PNG image or zipped bmp/ppm.
Back to top
View users profile Send private message Add User to Ignore List Send email Visit posters website
t3rmin4t0r
Newbie


Age:34
Gender:Gender:Male
Joined: May 14 2011
Posts: 1
Offline

PostPosted: Sat May 14, 2011 4:02 pm    Post subject: Reply to topic Reply with quote

Hi,
Sorry i am new here. I want to convert the following image (captcha) in to text.. I am using vb6 . How to do it? Would you like to share its coding?



I can convert its background into black color.. But wats next? icon_rolleyes.gif

-Thanks in advance




Captcha

Captha0.png - 1.89 KB
File downloaded or viewed 11 time(s)
Back to top
View users profile Send private message Add User to Ignore List
Display posts from previous:   
Post new topic   Reply to topic    Server Help Forum Index -> Non-Subspace Related Coding All times are GMT - 5 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum
View online users | View Statistics | View Ignored List


Software by php BB © php BB Group
Server Load: 75 page(s) served in previous 5 minutes.

phpBB Created this page in 0.125508 seconds : 52 queries executed (39.9%): GZIP compression disabled