Programmer Of The Month (POTM) contest ... THE SUMMER EDITION ....
 Deadline for this one is midnight on Tuesday August 31, 1993.

	How hard is it to look up a word in a dictionary when 
	you don't know how to spell the word? 

	How hard is it to find the "correct" number in a list 
	when all you have is a number where you think that 
	"most" of the digits are OK?

This latter question is the subject of the Summer POTM:  MR. NOISE!

============================================================================
Here is the model ...
                 -------------
TRUE NUMBERS ->  | Mr. Noise |  ->  CORRUPTED NUMBERS ->  ----------------
   /\            -------------                            |              |
  /||\                                                    | Your Program |
   ||                                                     |              |
  LIST OF GOOD NUMBERS  ------------------------------->  ----------------
                                                                 ||
                                                                \||/
 I'll tell you all about what Mr. Noise does to the              \/
 numbers, and I'll score you on how well YOUR GUESSES      YOUR GUESSES
 match the TRUE NUMBERS I start with.  In fact,
 the actual Mr.Noise shell program is provided below!
============================================================================

 If you deduce a TRUE NUMBER you get a reward (10).  If you give up,
 you get a smaller reward (3).  If you guess WRONG, you get nothing!

 (For example, the TRUE number is 5551212.  Mr. Noise corrupted it
 to 55612192.  You have to determine that the intended number in the
 good number list was really 5551212.)

THE GORY DETAILS FOLLOW ... READ ON IF YOU WANT TO TACKLE THE CHALLENGE!
   IF NOT .... SEE YOU NEXT TIME!!!   (Please join us ... the solution
   to this one may even have AT&T related application use!)

=Fred   (homxb!fah)  (Send mail if you want to be taken OFF the mailing list!)

============================= THE DETAILS =============================

YOUR PROGRAM:

  1) Assuming your program is a.out, then it should take two arguments:

	a.out /tmp/good_nums /tmp/test_list

	The first arg (/tmp/good_nums for example) is the name of a file 
	containing valid numbers.  /tmp/good_nums contains one seven digit 
	number per line, with no white space.  This file will not contain
	numbers beginning with "0".  The file will be numerically sorted
	and will contain no repetitions.  (A shell/awk program is provided 
	to generate a sample list of valid random numbers.)

	The second arg (/tmp/test_list for example) is the name of a file
	containing digit strings to be tested.  This input file contains
	one number per line, between 1 and 14 digits each, with no white 
	space.  There MAY be leading zeroes in these numbers.  There MAY
	be repeats and it need not be sorted.  (I will generate this file
	by corrupting a subset of the valid numbers with the MR.NOISE shell
	using some fixed set of probabilities.)

  2) Your program should generate, to standard output, exactly one output
	number for each input number in /tmp/test_list (the second arg.).

	Line N of the output should contain the results of your test of
	line N of the input - and should be either a valid number
	from /tmp/good_nums or "0000000" to signify that you chose not to
	make a guess on this input line.  (If your output is NOT in the
	/tmp/good_nums list, it will be treated as a bad guess.)

  3) Name your program something clever ... the last set of program names
	were particularly boring.

THE DIGITS:

  1) All "digits" are zero through 9, provided via [0123456789]
  
  2) The "valid" numbers in the good_nums file will be seven digits long.
	They may begin with any digit except 0 (zero).  They thus
	range from 1000000 to 9999999 numerically.
  
  3) I will corrupt the seven digit number and will return a digit string
  	using the digits [0-9].  This digit string will be between 1 and
  	14 digits long (no white space).  See the section on CORRUPTION below.

CORRUPTION ... (I am Mr. Noise ... I will wreak havoc ...):

  1) I will corrupt a seven digit number by examining each digit in turn,
	and applying one or more of three different corruptions to that 
	digit based on probabilities generated from random numbers.

  2) With probability P(change) I will change the single digit to some
	other digit.  (e.g. 9 -> 4) 

  3) With probability P(remove) I will delete the digit altogether.

  4) With probability P(add) I will add a random digit to the string
	following this digit.  (e.g. 3 -> 37 )

  5) With probability P(surprise) I will generate a completely random
	seven digit number (on a separate line) in addition to corrupting 
	the input number.  

  6) An "awk" program is provided ... this is what I will use to corrupt
	the seven digit number.  (Note that if your version of awk does not
	support the "rand" function, try "nawk" or "xawk".)

  7) The probabilities for each digit are fixed for each test run and follow:

        	0. <= P(change)   <= .20
		0. <= P(remove)   <= .20
		0. <= P(add)      <= .20  (given a digit was not removed)
		0. <= P(surprise) <= .10  (applied once per input number)

	The probabilities will be applied to each digit of the TRUE number 
		as demonstrated in the awk program.

  8) As shown by the awk program, I may either add or delete a digit - but not
	both.  If I change a digit, I may then add another or delete that
	digit.  Independent of what I changed, added, or deleted - I may
	add at most one "surprise" number for each input number.

THE SCORING:

The scoring is based on the numbers you return on standard output:

  1) If the number is 0000000 it will be assumed that you decided not 
	to correct the number - you will be awarded 3 points regardless
	of what the "true" input was. (Exception: see #4 below)

  2) If you return the CORRECT uncorrupted 7 digit number, you will be
	awarded 10 points. CORRECT means that you returned the number
	that I started with before I applied the corruption - I will
	determine which number I started with based on the list I gave
	to the "Mr.Noise" awk script.  (It is possible that you return
	a "closer" entry than the one I started with ... no matter, it
	is MY number that earns you the points - the one that would be
	printed if DEBUG=1 in the awk script.)

  3) If you return an INCORRECT 7 digit number, you get no points.

  4) When "Mr.Noise" adds a random number (with P(surprise)), I will
	assume that "0000000" is a CORRECT guess (worth 10 points).
	If "Mr.Noise" adds a random number that happens to be on the 
	good_nums list, I will remove the number from the list before 
	running the tests.

  5) Thus, for 1,000 input numbers, the worst score would be zero.  The
	best score would be 10,000.  If your program does absolutely
	nothing except output 0000000 every time, your score will be 
	at least 3,000.

============================= THE JUDGING =============================

  1) For system test, I will generate a valid good_nums list of length 
	5,000 and five lists of 1000 test numbers each using the awk 
	scripts included below.  The five tests will each have different
	probabilities used.  Your score will be the number of points 
	earned when I compare your five outputs with the five perfect 
	inputs I use to drive the awk script.  As always, you will be
	mailed your system test results (as well as those of other players!)

	(Note that with DEBUG=1, the awk script will generate the
	CORRECT numbers as well as the corrupted numbers.)

  2) For the finals, I will generate 10 such lists - each of which will
	contain between 100 and 5,000 test entries.  These will be played
	against valid number files containing between 500 and 10,000
	entries each.

  3) Final score will be the sum of the scores on the ten runs.

  4) In the (unlikely??) event of a tie,  the quickest (sys+user) program
	will win.  Unlike the CAT->DOG POTM, I do NOT expect to have to
	resort to this tie-breaker for MR.NOISE.

  5) Any arbitrary decisions made by me along the way will be valid.

============================= THE ENVIRONMENT =========================

I run on an Amdahl 5870 ... homxb by name.  You supply me with source code.
(I can handle K&R C code (no ANSI), C++, or anything supported by exptools.
  I'd be willing to try anything you think might work ... so try!)

I will "make" the code when I receive it and get back to you immediately (?)
with your system test results.  In the past, it has proven wise to submit 
your entry EARLY so that we can work through any portability issues via email.

I will distribute weekly or bi-weekly leader boards based on the system tests
ONLY to those folks who have sent me an entry to that date.

a) No include files other than "stdio.h" in C.  For C++ add "stdlib.h"
	and "string.h".  Others by special request only.
b) No non-standard libraries are permitted.
c) sys+user time on the five system tests must total less than 300 seconds
	in order to qualify for the finals.
d) Any temp files must be created in /tmp and removed when done.
e) No assembler code (too much advantage for local Amdahlers).
f) Time is measured with timex, sys+user, for better or worse.
g) Standard compile line is "cc -O" for C programs, others as provided.
h) Program must end normally - no core dumps!
i) Your submission may be distributed freely and used by anyone for anything
j) this list will grow for future POTMs ...

============================= SUBMITTING AN ENTRY ====================

Use "uuto filename homxb!fah" to send me an entry ... please accompany it
with some mail explaining how to compile the entry.  If you prefer, you
may send your entry via email to homxb!fah or attmail!hicinbothem.  

I will not examine your source since I will be competing, but I will ship you
lint output, or compile errors, or stuff like that if things don't work
out the first time!  

You may submit multiple entries, but ONLY YOUR LAST ENTRY WILL BE SAVED.  
I must have a single email address and name to associate with each entry, 
even if you choose to work in teams - all results will go to this individual. 

Questions submitted via email will be answered.

All entries must be received by the closing date/time - as measured in NJ!

============================= TWO USEFUL SHELL PROGRAMS ==============

The following are two programs which should prove helpful in testing
 your programs.  They are the programs I will use for testing them!

############################# CUT HERE ##################################
#
# This program is Mr.Noise.  It is an shell/awk program that uses the built
#  in function "rand".  If the version of awk on your system does not
#  support "rand" ... try xawk (exptools) or nawk.  If this still does
#  not work ... write a C program and accept my apologies!
#
# This program expects a standard input containing seven digit numbers,
#  one per line.  It will corrupt these numbers and
#  produce a corresponding string of corrupted digits on stdout.
#  Every now and then it will add a completely random number.
#
# The loop at the front contains a SEED that will be changed by ME when
#  I run the tests ... so don't try and predict the errors precisely!
#  Note that SEED should be pretty big to avoid generating numbers that
#  might also be in the dictionary (the other shell program).
#  During testing I will also modify the four probabilities (see rules).
#
# Setting DEBUG=1 will cause the program to print out TWO numbers per
#  line - the corrupted number followed by the "Correct" number

sed "s/./& /g" |		# put spaces in the numbers for awk

xawk ' BEGIN { SEED=15345 ; DEBUG=0 ; for(i=1;i999999 && x<=9999999) printf("%07d\n",x) ;
	}
}
     ' - | sort -n -u
	
#########################################################################
Make your own free website on Tripod.com