Programmer Of The Month (POTM) contest ... THE SUMMER EDITION ....
Deadline for this one is midnight on Tuesday August 31, 1993.
How hard is it to look up a word in a dictionary when
you don't know how to spell the word?
How hard is it to find the "correct" number in a list
when all you have is a number where you think that
"most" of the digits are OK?
This latter question is the subject of the Summer POTM: MR. NOISE!
============================================================================
Here is the model ...
-------------
TRUE NUMBERS -> | Mr. Noise | -> CORRUPTED NUMBERS -> ----------------
/\ ------------- | |
/||\ | Your Program |
|| | |
LIST OF GOOD NUMBERS -------------------------------> ----------------
||
\||/
I'll tell you all about what Mr. Noise does to the \/
numbers, and I'll score you on how well YOUR GUESSES YOUR GUESSES
match the TRUE NUMBERS I start with. In fact,
the actual Mr.Noise shell program is provided below!
============================================================================
If you deduce a TRUE NUMBER you get a reward (10). If you give up,
you get a smaller reward (3). If you guess WRONG, you get nothing!
(For example, the TRUE number is 5551212. Mr. Noise corrupted it
to 55612192. You have to determine that the intended number in the
good number list was really 5551212.)
THE GORY DETAILS FOLLOW ... READ ON IF YOU WANT TO TACKLE THE CHALLENGE!
IF NOT .... SEE YOU NEXT TIME!!! (Please join us ... the solution
to this one may even have AT&T related application use!)
=Fred (homxb!fah) (Send mail if you want to be taken OFF the mailing list!)
============================= THE DETAILS =============================
YOUR PROGRAM:
1) Assuming your program is a.out, then it should take two arguments:
a.out /tmp/good_nums /tmp/test_list
The first arg (/tmp/good_nums for example) is the name of a file
containing valid numbers. /tmp/good_nums contains one seven digit
number per line, with no white space. This file will not contain
numbers beginning with "0". The file will be numerically sorted
and will contain no repetitions. (A shell/awk program is provided
to generate a sample list of valid random numbers.)
The second arg (/tmp/test_list for example) is the name of a file
containing digit strings to be tested. This input file contains
one number per line, between 1 and 14 digits each, with no white
space. There MAY be leading zeroes in these numbers. There MAY
be repeats and it need not be sorted. (I will generate this file
by corrupting a subset of the valid numbers with the MR.NOISE shell
using some fixed set of probabilities.)
2) Your program should generate, to standard output, exactly one output
number for each input number in /tmp/test_list (the second arg.).
Line N of the output should contain the results of your test of
line N of the input - and should be either a valid number
from /tmp/good_nums or "0000000" to signify that you chose not to
make a guess on this input line. (If your output is NOT in the
/tmp/good_nums list, it will be treated as a bad guess.)
3) Name your program something clever ... the last set of program names
were particularly boring.
THE DIGITS:
1) All "digits" are zero through 9, provided via [0123456789]
2) The "valid" numbers in the good_nums file will be seven digits long.
They may begin with any digit except 0 (zero). They thus
range from 1000000 to 9999999 numerically.
3) I will corrupt the seven digit number and will return a digit string
using the digits [0-9]. This digit string will be between 1 and
14 digits long (no white space). See the section on CORRUPTION below.
CORRUPTION ... (I am Mr. Noise ... I will wreak havoc ...):
1) I will corrupt a seven digit number by examining each digit in turn,
and applying one or more of three different corruptions to that
digit based on probabilities generated from random numbers.
2) With probability P(change) I will change the single digit to some
other digit. (e.g. 9 -> 4)
3) With probability P(remove) I will delete the digit altogether.
4) With probability P(add) I will add a random digit to the string
following this digit. (e.g. 3 -> 37 )
5) With probability P(surprise) I will generate a completely random
seven digit number (on a separate line) in addition to corrupting
the input number.
6) An "awk" program is provided ... this is what I will use to corrupt
the seven digit number. (Note that if your version of awk does not
support the "rand" function, try "nawk" or "xawk".)
7) The probabilities for each digit are fixed for each test run and follow:
0. <= P(change) <= .20
0. <= P(remove) <= .20
0. <= P(add) <= .20 (given a digit was not removed)
0. <= P(surprise) <= .10 (applied once per input number)
The probabilities will be applied to each digit of the TRUE number
as demonstrated in the awk program.
8) As shown by the awk program, I may either add or delete a digit - but not
both. If I change a digit, I may then add another or delete that
digit. Independent of what I changed, added, or deleted - I may
add at most one "surprise" number for each input number.
THE SCORING:
The scoring is based on the numbers you return on standard output:
1) If the number is 0000000 it will be assumed that you decided not
to correct the number - you will be awarded 3 points regardless
of what the "true" input was. (Exception: see #4 below)
2) If you return the CORRECT uncorrupted 7 digit number, you will be
awarded 10 points. CORRECT means that you returned the number
that I started with before I applied the corruption - I will
determine which number I started with based on the list I gave
to the "Mr.Noise" awk script. (It is possible that you return
a "closer" entry than the one I started with ... no matter, it
is MY number that earns you the points - the one that would be
printed if DEBUG=1 in the awk script.)
3) If you return an INCORRECT 7 digit number, you get no points.
4) When "Mr.Noise" adds a random number (with P(surprise)), I will
assume that "0000000" is a CORRECT guess (worth 10 points).
If "Mr.Noise" adds a random number that happens to be on the
good_nums list, I will remove the number from the list before
running the tests.
5) Thus, for 1,000 input numbers, the worst score would be zero. The
best score would be 10,000. If your program does absolutely
nothing except output 0000000 every time, your score will be
at least 3,000.
============================= THE JUDGING =============================
1) For system test, I will generate a valid good_nums list of length
5,000 and five lists of 1000 test numbers each using the awk
scripts included below. The five tests will each have different
probabilities used. Your score will be the number of points
earned when I compare your five outputs with the five perfect
inputs I use to drive the awk script. As always, you will be
mailed your system test results (as well as those of other players!)
(Note that with DEBUG=1, the awk script will generate the
CORRECT numbers as well as the corrupted numbers.)
2) For the finals, I will generate 10 such lists - each of which will
contain between 100 and 5,000 test entries. These will be played
against valid number files containing between 500 and 10,000
entries each.
3) Final score will be the sum of the scores on the ten runs.
4) In the (unlikely??) event of a tie, the quickest (sys+user) program
will win. Unlike the CAT->DOG POTM, I do NOT expect to have to
resort to this tie-breaker for MR.NOISE.
5) Any arbitrary decisions made by me along the way will be valid.
============================= THE ENVIRONMENT =========================
I run on an Amdahl 5870 ... homxb by name. You supply me with source code.
(I can handle K&R C code (no ANSI), C++, or anything supported by exptools.
I'd be willing to try anything you think might work ... so try!)
I will "make" the code when I receive it and get back to you immediately (?)
with your system test results. In the past, it has proven wise to submit
your entry EARLY so that we can work through any portability issues via email.
I will distribute weekly or bi-weekly leader boards based on the system tests
ONLY to those folks who have sent me an entry to that date.
a) No include files other than "stdio.h" in C. For C++ add "stdlib.h"
and "string.h". Others by special request only.
b) No non-standard libraries are permitted.
c) sys+user time on the five system tests must total less than 300 seconds
in order to qualify for the finals.
d) Any temp files must be created in /tmp and removed when done.
e) No assembler code (too much advantage for local Amdahlers).
f) Time is measured with timex, sys+user, for better or worse.
g) Standard compile line is "cc -O" for C programs, others as provided.
h) Program must end normally - no core dumps!
i) Your submission may be distributed freely and used by anyone for anything
j) this list will grow for future POTMs ...
============================= SUBMITTING AN ENTRY ====================
Use "uuto filename homxb!fah" to send me an entry ... please accompany it
with some mail explaining how to compile the entry. If you prefer, you
may send your entry via email to homxb!fah or attmail!hicinbothem.
I will not examine your source since I will be competing, but I will ship you
lint output, or compile errors, or stuff like that if things don't work
out the first time!
You may submit multiple entries, but ONLY YOUR LAST ENTRY WILL BE SAVED.
I must have a single email address and name to associate with each entry,
even if you choose to work in teams - all results will go to this individual.
Questions submitted via email will be answered.
All entries must be received by the closing date/time - as measured in NJ!
============================= TWO USEFUL SHELL PROGRAMS ==============
The following are two programs which should prove helpful in testing
your programs. They are the programs I will use for testing them!
############################# CUT HERE ##################################
#
# This program is Mr.Noise. It is an shell/awk program that uses the built
# in function "rand". If the version of awk on your system does not
# support "rand" ... try xawk (exptools) or nawk. If this still does
# not work ... write a C program and accept my apologies!
#
# This program expects a standard input containing seven digit numbers,
# one per line. It will corrupt these numbers and
# produce a corresponding string of corrupted digits on stdout.
# Every now and then it will add a completely random number.
#
# The loop at the front contains a SEED that will be changed by ME when
# I run the tests ... so don't try and predict the errors precisely!
# Note that SEED should be pretty big to avoid generating numbers that
# might also be in the dictionary (the other shell program).
# During testing I will also modify the four probabilities (see rules).
#
# Setting DEBUG=1 will cause the program to print out TWO numbers per
# line - the corrupted number followed by the "Correct" number
sed "s/./& /g" | # put spaces in the numbers for awk
xawk ' BEGIN { SEED=15345 ; DEBUG=0 ; for(i=1;i999999 && x<=9999999) printf("%07d\n",x) ;
}
}
' - | sort -n -u
#########################################################################