<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rick's Awesome Blog &#187; linux</title>
	<atom:link href="http://www.richardosgood.com/blog/tag/linux/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.richardosgood.com/blog</link>
	<description>(Not) Just another WordPress weblog</description>
	<lastBuildDate>Sun, 05 Apr 2009 12:23:05 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>New Project Completed</title>
		<link>http://www.richardosgood.com/blog/2008/05/22/new-project-compelted/</link>
		<comments>http://www.richardosgood.com/blog/2008/05/22/new-project-compelted/#comments</comments>
		<pubDate>Thu, 22 May 2008 22:05:02 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Project]]></category>
		<category><![CDATA[Update]]></category>
		<category><![CDATA[anniversary]]></category>
		<category><![CDATA[lamp]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[sx]]></category>

		<guid isPermaLink="false">http://rickspbx.dyndns.org:81/blog/?p=52</guid>
		<description><![CDATA[It&#8217;s been a while since I posted on here.  There are several reasons for that.  The main reason is that my latest project has been taking all my spare time and it was a secret.  I didn&#8217;t log any of it until just a few minutes ago because I didn&#8217;t want the [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been a while since I posted on here.  There are several reasons for that.  The main reason is that my latest project has been taking all my spare time and it was a secret.  I didn&#8217;t log any of it until just a few minutes ago because I didn&#8217;t want the secret to get out.  It is an anniversary present for my girlfriend.  You can check out the project page for more details on that.</p>
<p>The second reason is that my web server has been down and I haven&#8217;t fixed it until recently.  My server rebooted one day when I lost power and Apache refused to start for some reason.  Rather than sitting down to fix that, I just spent all my time working on the anniversary project.  It turns out there was some other instance of httpd running in the background hogging port 81.  I have no idea why this was.  I&#8217;ll have to reboot the system again to see if the problem occurs again.  At least I&#8217;ll know what the problem is.</p>
<p>In other news, I have started the Near Space class at school last week.  I am really excited for this class.  We will be sending a balloon equipped with computer, science experiments, and a camera into near space in just a few months.  Ryan is splitting the class into teams and should have them posted on the e-shell this weekend at some point.  Hopefully I&#8217;ll have access to the shell soon.  I just e-mailed a local enthusiast to see if he wants to come to class to share his experiences and offer some words of wisdom.  Hopefully that will go over well.</p>
<p>My dad should be sending me another radio, antenna and a Tiny Trak 3 module next week.  I can&#8217;t wait to get that stuff.  I want to start messing with APRS tracking as soon as possible to get a feel for it before we actually do a launch.  I&#8217;m hoping to be on the tracking and telemetry team for the near space class.</p>
<p>I suppose that&#8217;s enough updating for now.  I have to take some photos of the anniversary lamp to stick on that page, as well as get a schematic up.  Man, I still need to get a schematic up on the graduation hacks page&#8230;  I&#8217;ll get on that soon.  I&#8217;ll also post a video of the lamp in action.  Until then.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.richardosgood.com/blog/2008/05/22/new-project-compelted/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>E-mail Harvest</title>
		<link>http://www.richardosgood.com/blog/2008/04/10/email-harvest/</link>
		<comments>http://www.richardosgood.com/blog/2008/04/10/email-harvest/#comments</comments>
		<pubDate>Thu, 10 Apr 2008 10:25:22 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Project]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[script]]></category>

		<guid isPermaLink="false">http://rickspbx.dyndns.org:81/blog/?p=33</guid>
		<description><![CDATA[I&#8217;m starting to work on the E-mail harvesting program now.  The other day I went to myspace and took a look around.  Guess what?  No e-mail addresses are visible anywhere.  There&#8217;s no specific place to pull e-mail addresses from.  That&#8217;s when I decided to go check out facebook.  These [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m starting to work on the E-mail harvesting program now.  The other day I went to myspace and took a look around.  Guess what?  No e-mail addresses are visible anywhere.  There&#8217;s no specific place to pull e-mail addresses from.  That&#8217;s when I decided to go check out facebook.  These guys are crafty.  They include your e-mail address but they include it as an image.  That way you can&#8217;t just copy and paste the text.  Well I think to think that I am craftier.  I started doing a little Google research on linux-based OCR software.  For those that don&#8217;t already know OCR stands for optical character recognition.  This software will read an image and turn the text located within it into an actual editable text document.</p>
<p>I found <a title="this awesome article" href="http://groundstate.ca/ocr">this awesome article</a> comparing many different OCR engines designed for linux. I&#8217;ve decided that gocr is the simplest solution that should do everything I need it too.  I just need a program I can send an image too and have that program send me back text.  That is exactly how gocr works.  Now i just have to get it installed on CentOS.</p>
<p>I found the source for gocr at <a href="http://jocr.sourceforge.net">http://jocr.sourceforge.net</a>.  I just run the command:</p>
<p><em> wget http://prdownloads.sourceforge.net/jocr/gocr-0.45.tar.gz</em></p>
<p>Then I extract the file:</p>
<p><em> tar -xzvf gocr-0.45.tar.gz</em></p>
<p>configure, make, and install:</p>
<p><em> ./configure<br />
make<br />
sudo make install</em></p>
<p>The image files on facebook are png images.  gocr uses a utility called pngtopnm to convert the image to a format it can understand.  This utility is included in the netpbm package.</p>
<p><em>sudo yum install netpbm</em><br />
<em>sudo yum install netpbm-progs</em></p>
<p>Now that everything is installed I can just try running the program with a downloaded facebook email image.</p>
<p><em>gocr -i test.png</em></p>
<p>The image I gave it contained my email address &#8220;ricosgoo@uat.edu&#8221;.  The result: &#8220;ricgoouat.edu&#8221;.  It seems as though gocr didn&#8217;t pick it up correctly.  I&#8217;m pretty sure the reason is that the &#8216;o&#8217; and the &#8217;s&#8217; in the image are touching each other.  gocr probably thinks it is one character and cannot recognize it so it is just leaving it out.  Also, it missed the @ symbol.  I tried a different facebook image and the @ sign was missing from that as well.  It would seem as though gocr does not support the @ sign in its dictionary.  I might need to try a different OCR program.</p>
<p>Doing some more google research, I found that many people feel that HP&#8217;s tesseracr-ocr is one of the best open-source OCRs there is.  That was my next logical step.  I followed <a href="http://groundstate.ca/ocr">this guide</a> again to get the software up and running.</p>
<p><em>wget http://tesseract-ocr.googlecode.com/files/tesseract-2.01.tar.gz<br />
tar -xzvf tesseract-2.01.tar.gz<br />
cd tesseract-2.01<br />
./configure<br />
make<br />
sudo make install</em></p>
<p>Now I have to install the English language dictionary files for tesseract.</p>
<p><em>wget http://tesseract-ocr.googlecode.com/files/tesseract-2.00.eng.tar.gz<br />
tar -xzvf tesseract-2.00.eng.tar.gz<br />
cd tesseract-2.00.eng<br />
sudo cp * /usr/local/share/tessdata/</em></p>
<p>I also needed to install ImageMagick so that I can convert the facebook images to tiff files.  I have to do this because tesseract-ocr only supports tiff images right now.</p>
<p><em>sudo yum install ImageMagick.i386</em></p>
<p>Now I convert the image to a tiff file.</p>
<p><em>convert test.png test.tiff</em></p>
<p>Now I try out the OCR.</p>
<p><em>tesseract test.tiff test.txt</em></p>
<p>No good.  I get error messages.  Here is Tesseract&#8217;s output:</p>
<p><em>Tesseract Open Source OCR Engine<br />
name_to_image_type:Error:Unrecognized image type:test.tiff<br />
IMAGE::read_header:Error:Can&#8217;t read this image type:test.tiff<br />
tesseract:Error:Read of file failed:test.tiff<br />
Signal_exit 31 ABORT. LocCode: 3  AbortCode: 3</em></p>
<p>I have to take a break from all this now, so I&#8217;ll deal with these problems later.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.richardosgood.com/blog/2008/04/10/email-harvest/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
