<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rick's Awesome Blog &#187; script</title>
	<atom:link href="http://www.richardosgood.com/blog/tag/script/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.richardosgood.com/blog</link>
	<description>(Not) Just another WordPress weblog</description>
	<lastBuildDate>Sun, 05 Apr 2009 12:23:05 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>E-mail Harvest</title>
		<link>http://www.richardosgood.com/blog/2008/04/10/email-harvest/</link>
		<comments>http://www.richardosgood.com/blog/2008/04/10/email-harvest/#comments</comments>
		<pubDate>Thu, 10 Apr 2008 10:25:22 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Project]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[script]]></category>

		<guid isPermaLink="false">http://rickspbx.dyndns.org:81/blog/?p=33</guid>
		<description><![CDATA[I&#8217;m starting to work on the E-mail harvesting program now.  The other day I went to myspace and took a look around.  Guess what?  No e-mail addresses are visible anywhere.  There&#8217;s no specific place to pull e-mail addresses from.  That&#8217;s when I decided to go check out facebook.  These [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m starting to work on the E-mail harvesting program now.  The other day I went to myspace and took a look around.  Guess what?  No e-mail addresses are visible anywhere.  There&#8217;s no specific place to pull e-mail addresses from.  That&#8217;s when I decided to go check out facebook.  These guys are crafty.  They include your e-mail address but they include it as an image.  That way you can&#8217;t just copy and paste the text.  Well I think to think that I am craftier.  I started doing a little Google research on linux-based OCR software.  For those that don&#8217;t already know OCR stands for optical character recognition.  This software will read an image and turn the text located within it into an actual editable text document.</p>
<p>I found <a title="this awesome article" href="http://groundstate.ca/ocr">this awesome article</a> comparing many different OCR engines designed for linux. I&#8217;ve decided that gocr is the simplest solution that should do everything I need it too.  I just need a program I can send an image too and have that program send me back text.  That is exactly how gocr works.  Now i just have to get it installed on CentOS.</p>
<p>I found the source for gocr at <a href="http://jocr.sourceforge.net">http://jocr.sourceforge.net</a>.  I just run the command:</p>
<p><em> wget http://prdownloads.sourceforge.net/jocr/gocr-0.45.tar.gz</em></p>
<p>Then I extract the file:</p>
<p><em> tar -xzvf gocr-0.45.tar.gz</em></p>
<p>configure, make, and install:</p>
<p><em> ./configure<br />
make<br />
sudo make install</em></p>
<p>The image files on facebook are png images.  gocr uses a utility called pngtopnm to convert the image to a format it can understand.  This utility is included in the netpbm package.</p>
<p><em>sudo yum install netpbm</em><br />
<em>sudo yum install netpbm-progs</em></p>
<p>Now that everything is installed I can just try running the program with a downloaded facebook email image.</p>
<p><em>gocr -i test.png</em></p>
<p>The image I gave it contained my email address &#8220;ricosgoo@uat.edu&#8221;.  The result: &#8220;ricgoouat.edu&#8221;.  It seems as though gocr didn&#8217;t pick it up correctly.  I&#8217;m pretty sure the reason is that the &#8216;o&#8217; and the &#8217;s&#8217; in the image are touching each other.  gocr probably thinks it is one character and cannot recognize it so it is just leaving it out.  Also, it missed the @ symbol.  I tried a different facebook image and the @ sign was missing from that as well.  It would seem as though gocr does not support the @ sign in its dictionary.  I might need to try a different OCR program.</p>
<p>Doing some more google research, I found that many people feel that HP&#8217;s tesseracr-ocr is one of the best open-source OCRs there is.  That was my next logical step.  I followed <a href="http://groundstate.ca/ocr">this guide</a> again to get the software up and running.</p>
<p><em>wget http://tesseract-ocr.googlecode.com/files/tesseract-2.01.tar.gz<br />
tar -xzvf tesseract-2.01.tar.gz<br />
cd tesseract-2.01<br />
./configure<br />
make<br />
sudo make install</em></p>
<p>Now I have to install the English language dictionary files for tesseract.</p>
<p><em>wget http://tesseract-ocr.googlecode.com/files/tesseract-2.00.eng.tar.gz<br />
tar -xzvf tesseract-2.00.eng.tar.gz<br />
cd tesseract-2.00.eng<br />
sudo cp * /usr/local/share/tessdata/</em></p>
<p>I also needed to install ImageMagick so that I can convert the facebook images to tiff files.  I have to do this because tesseract-ocr only supports tiff images right now.</p>
<p><em>sudo yum install ImageMagick.i386</em></p>
<p>Now I convert the image to a tiff file.</p>
<p><em>convert test.png test.tiff</em></p>
<p>Now I try out the OCR.</p>
<p><em>tesseract test.tiff test.txt</em></p>
<p>No good.  I get error messages.  Here is Tesseract&#8217;s output:</p>
<p><em>Tesseract Open Source OCR Engine<br />
name_to_image_type:Error:Unrecognized image type:test.tiff<br />
IMAGE::read_header:Error:Can&#8217;t read this image type:test.tiff<br />
tesseract:Error:Read of file failed:test.tiff<br />
Signal_exit 31 ABORT. LocCode: 3  AbortCode: 3</em></p>
<p>I have to take a break from all this now, so I&#8217;ll deal with these problems later.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.richardosgood.com/blog/2008/04/10/email-harvest/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Another new idea and a cantenna update</title>
		<link>http://www.richardosgood.com/blog/2008/04/08/another-new-idea-and-a-cantenna-update/</link>
		<comments>http://www.richardosgood.com/blog/2008/04/08/another-new-idea-and-a-cantenna-update/#comments</comments>
		<pubDate>Tue, 08 Apr 2008 23:55:14 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Project]]></category>
		<category><![CDATA[Update]]></category>
		<category><![CDATA[cantenna]]></category>
		<category><![CDATA[script]]></category>

		<guid isPermaLink="false">http://rickspbx.dyndns.org:81/blog/?p=29</guid>
		<description><![CDATA[Today I only went to one class: Law370.  Normally, I really hate the thought of going to the class, but it&#8217;s always a lot of fun.  That professor really knows how to teach.  I always learn something new from that class.  Today, we were separated into groups and each group had [...]]]></description>
			<content:encoded><![CDATA[<p>Today I only went to one class: Law370.  Normally, I really hate the thought of going to the class, but it&#8217;s always a lot of fun.  That professor really knows how to teach.  I always learn something new from that class.  Today, we were separated into groups and each group had to research a specific law regarding cyber-crime.  This whole activity spawned a new project idea.</p>
<p>My group was assigned the CAN-SPAM act of 2003.  This act basically has all these rules regulating how spam e-mail can be sent.  I&#8217;m not going into that because it&#8217;s long, it&#8217;s complicated, and it really doesn&#8217;t matter for my project idea.  My project basically will be a script that will crawl social networking sites like Facebook and MySpace to collect e-mail addresses.  It gets more diabolical than that, though.  The script will log onto someone&#8217;s MySpace account and get their e-mail.  Then, the script will log onto each of that person&#8217;s &#8220;Top 8&#8243; friends and get THEIR e-mail addresses.  Now, the script can send a phishing e-mail to each of the friends on the &#8220;Top 8&#8243; list and spoof the e-mail that it originates from to look like it is coming from the original person.  I think this would be an awesome and fun proof of concept.  I would never use actually use this for my own malicious purposes, although I would be interested to see how well it would actually work.  I really just want to write this script just to do it.  It would give me an excuse to brush up on my scripting and programming skills.</p>
<p>I think I&#8217;ll get started on this idea soon, seeing as it won&#8217;t cost me any money.</p>
<p>Another update here.  I started working on the cantenna project some more.  I bought the pigtail that I need, cut off one end and soldered it to the PCMCIA card.  I&#8217;ve also soldered the piece of copper wire to the jack that attaches to the can.  All I need now is a can to attach this thing too.  The solder points on the PCB were so small, I&#8217;m not sure that the connections will be good enough.  Hopefully I&#8217;ll find out tomorrow.  I don&#8217;t have any class so I have the entire day off.  I plan on getting a can either form the cafe at school or from the supermarket.  I shall update the cantenna page as time permits.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.richardosgood.com/blog/2008/04/08/another-new-idea-and-a-cantenna-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
