Memorising Pi: Difference between revisions

From Nottinghack Wiki
Jump to navigation Jump to search
mNo edit summary
assumed name is name Memorising Pi
 
(25 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Category:Projects]]
{{Project
[[Category:Projects (in progress)]]
|name=Memorising Pi
[[Category:Michael's Projects]]
|primary=[[User:Msemtd|Michael Erskine]]
 
|created=08/02/2016
|status=In Progress
}}
On January 5th 2016 I set myself a goal to memorise Pi to 100 decimal places. My motivation for this was to learn the [[https://en.wikipedia.org/wiki/Mnemonic_major_system Major Mnemonic System]] and it's also a personal challenge. Prior to mid December 2015 I knew the five digits that I had learned at school: 3.14159. This had been sufficient for rough calculations with a calculator (for mental arithmetic I use the approximation of 22/7) and for any accuracy and programming I would use constants provided by the language or software. I had previously looked at the Major system but was unable to apply it due to lack of practice. So in December whilst discussing pi with one of my daughter's friends, she was able to recite 10 decimal places (3.1415926535) which I found to be quite impressive. So in the following weeks I learned the next 5 digits I needed to get up to 10 but without using a particular established method; just repetition. A short while later, at the start of January, I had a discussion with a friend at the Hackspace about memory sports and I decided to learn Major system properly with a well-defined goal: to be able to recite pi up to the 100th decimal place. Once I applied myself to the task I quickly exceeded to 100 digits and by the end of January I was up to 300. I set a further goal to memorise 1000 digits in time for [https://en.wikipedia.org/wiki/Pi_Day World Pi Day] 2016.  
On January 5th 2016 I set myself a goal to memorise Pi to 100 decimal places. My motivation for this was to learn the [[https://en.wikipedia.org/wiki/Mnemonic_major_system Major Mnemonic System]] and it's also a personal challenge. Prior to mid December 2015 I knew the five digits that I had learned at school: 3.14159. This had been sufficient for rough calculations with a calculator (for mental arithmetic I use the approximation of 22/7) and for any accuracy and programming I would use constants provided by the language or software. I had previously looked at the Major system but was unable to apply it due to lack of practice. So in December whilst discussing pi with one of my daughter's friends, she was able to recite 10 decimal places (3.1415926535) which I found to be quite impressive. So in the following weeks I learned the next 5 digits I needed to get up to 10 but without using a particular established method; just repetition. A short while later, at the start of January, I had a discussion with a friend at the Hackspace about memory sports and I decided to learn Major system properly with a well-defined goal: to be able to recite pi up to the 100th decimal place. Once I applied myself to the task I quickly exceeded to 100 digits and by the end of January I was up to 300. I set a further goal to memorise 1000 digits in time for [https://en.wikipedia.org/wiki/Pi_Day World Pi Day] 2016.  


Line 11: Line 13:
== Major System Software ==
== Major System Software ==


I found that although there were many online and offline software tools for the Major system, none of them fit nicely with my requirements. I started to work on my own software to find handy mnemonics from the digit sequences. Because the major system is based on pronunciation and not spelling, I started looking into ways of encoding sounds with systems like the [https://en.wikipedia.org/wiki/International_Phonetic_Alphabet International Phonetic Alphabet] (IPA). Again, there are many online resources for phonetics but I wanted to write my own software (as usual!). I had previously worked with speech recognition for the Cheesioid project and the [https://en.wikipedia.org/wiki/CMU_Sphinx CMU Sphinx software]; what I was looking for was a phonetic database of British English words that could be programmatically mapped to number sequences. The CMU Sphinx libraries make use of such a dictionary which uses the [https://en.wikipedia.org/wiki/Arpabet ARPABET] ASCII notation for the encoding of phonemes. I created a simple mapping of all possible phonemes to the equivalent major digits as follows...
I found that although there were many online and offline software tools for the Major system, none of them fit nicely with my requirements. I started to work on my own software to find handy mnemonics from the digit sequences. Because the major system is based on pronunciation and not spelling, I started looking into ways of encoding sounds with systems like the [https://en.wikipedia.org/wiki/International_Phonetic_Alphabet International Phonetic Alphabet] (IPA). Again, there are many online resources for phonetics but I wanted to write my own software (as usual!). I had previously worked with speech recognition for the Cheesioid project and the [https://en.wikipedia.org/wiki/CMU_Sphinx CMU Sphinx software]; what I was looking for was a phonetic database of British English words that could be programmatically mapped to number sequences. The CMU Sphinx libraries make use of such a dictionary, known as the [https://en.wikipedia.org/wiki/CMU_Pronouncing_Dictionary CMU Pronouncing Dictionary], which uses the [https://en.wikipedia.org/wiki/Arpabet ARPABET] ASCII notation for the encoding of phonemes.
 
Here are the valid ARPABET phonemes with examples: -
<pre>
    //  Phoneme Example Translation
    //  ------- ------- -----------
    //  AA      odd    AA D
    //  AE      at      AE T
    //  AH      hut    HH AH T
    //  AO      ought  AO T
    //  AW      cow    K AW
    //  AY      hide    HH AY D
    //  B      be      B IY
    //  CH      cheese  CH IY Z
    //  D      dee    D IY
    //  DH      thee    DH IY
    //  EH      Ed      EH D
    //  ER      hurt    HH ER T
    //  EY      ate    EY T
    //  F      fee    F IY
    //  G      green  G R IY N
    //  HH      he      HH IY
    //  IH      it      IH T
    //  IY      eat    IY T
    //  JH      gee    JH IY
    //  K      key    K IY
    //  L      lee    L IY
    //  M      me      M IY
    //  N      knee    N IY
    //  NG      ping    P IH NG
    //  OW      oat    OW T
    //  OY      toy    T OY
    //  P      pee    P IY
    //  R      read    R IY D
    //  S      sea    S IY
    //  SH      she    SH IY
    //  T      tea    T IY
    //  TH      theta  TH EY T AH
    //  UH      hood    HH UH D
    //  UW      two    T UW
    //  V      vee    V IY
    //  W      we      W IY
    //  Y      yield  Y IY L D
    //  Z      zee    Z IY
    //  ZH      seizure S IY ZH ER
 
</pre>
 
APRABET phonemes are modified into "symbols" by adding optional digits to indicate emphasis within the word.
The full set of possible ARPABET symbols used by CMU Sphinx dictionaries is provided in the CMU Sphinx downloads as file "cmudict-0.7b.symbols". This is a reasonably small set (84 symbols) and I was quickly able to create the following simple mapping of all possible APRABET symbols to the equivalent major digits by ignoring emphasis, vowels and other elements silent to Major system.
 
<div style ="height:200px;overflow-x:hidden;overflow-y:auto;border: 4px solid purple;">
 
{| class="wikitable"
{| class="wikitable"
|-
|-
! symbol !! major !! notes  
! symbol !! major !! notes  
|-
|-
|| AA ||  ||
|| AA ||  || vowels not encoded
|-
|-
|| AA0 ||  ||
|| AA0 ||  ||
Line 186: Line 240:
|}
|}


Armed with this table I was able to encode the entire CMU English dictionary of some 300000 words to Major system equivalents. I don't know if this has ever been done before but I am happy to share my work! --[[User:Msemtd|Michael Erskine]] ([[User talk:Msemtd|talk]]) 12:16, 8 February 2016 (UTC)
</div>
 
Armed with this table I was able to encode the entire CMU English dictionary (cmudict-0.7b) of some 133000 words to Major system equivalents. I don't know if this has ever been done before but I am happy to share my work! --[[User:Msemtd|Michael Erskine]] ([[User talk:Msemtd|talk]]) 12:16, 8 February 2016 (UTC)
 
== Software to process the CMU Dictionary ==
 
I have written some software that reads the CMU Sphinx dictionary in ARPABET format and produces a database table that attaches Major encodings to each word.
 
[[File:Cmudict-0.7b-major-system.png|thumbnail]]
 
<pre>
CREATE TABLE `cmudict_07b_words` (
`counter` INT(11) NULL DEFAULT NULL,
`line` INT(11) NULL DEFAULT NULL,
`word` VARCHAR(50) NULL DEFAULT NULL,
`maj` VARCHAR(50) NULL DEFAULT NULL,
`len` INT(11) NULL DEFAULT NULL,
`phones` VARCHAR(100) NULL DEFAULT NULL
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;
 
</pre>
 
A database is nice but it is much, much faster to slurp the whole CMU dictionary from the original text file and process it in memory. Here's an example of C# code to read the CMU dictionary into a list of objects (one-per-word) that are the same as the database above, with major translations. This is part of a class named "CmuDict". By the way: the software is in C# but not because the language is particularly suited to the task: it's just what I happen to be using right now. Any expressive language with closures/lambdas/functors will do the job (Scheme, Java, Perl, ML, Haskell, etc.)
 
<div style ="height:200px;overflow-x:hidden;overflow-y:auto;border: 4px solid green;">
<syntaxhighlight lang="csharp" line="GESHI_FANCY_LINE_NUMBERS">
 
        public event EventHandler<Dentry> OnEntry;
 
        /// <summary>
        /// Read and iterate all CMU dictionary entries and raise event for each entry
        /// </summary>
        public void MajorEntriesIterateAll()
        {
            try {
                using (StringReader r = new StringReader(gpsd_tests.Properties.Resources.cmudict_0_7b)) {
                    int line = 0;
                    int count = 0;
                    int problem_entry = -1;
                    string s;
                    StringBuilder sb = new StringBuilder();
                    while ((s = r.ReadLine()) != null) {
                        line++;
                        if (s.StartsWith(";;;"))
                            continue;
                        var ix = s.IndexOf("  ");
                        if (ix < 0) {
                            problem_entry = line;
                            continue;
                        }
                        if (!Char.IsLetter(s[0]))
                            continue;
 
                        var word = s.Substring(0, ix);
                        var phones = s.Substring(ix + 2);
                        var sa = phones.Split(' ');
                        sb.Clear();
                        foreach (var p in sa) {
                            string m = majmap[p];
                            sb.Append(m);
                        }
                        var maj = sb.ToString();
                        int ml = maj.Length;
                        count++;
                        if(OnEntry != null) {
                            OnEntry(this, new Dentry(count, line, word, maj, ml, phones));
                        }
                    }
                    log.Info("entries in CMU dictionary: " + line);
                    if (problem_entry != -1) {
                        log.Info("problem entry at " + problem_entry);
                    }
                }
            } catch (Exception x) {
                log.Error("failed: " + x.Message);
            }
        }
 
 
</syntaxhighlight>
</div>
 
The simple "Dentry" class and "majmap" hashtables are as follows...
 
<div style ="height:200px;overflow-x:hidden;overflow-y:auto;border: 4px solid blue;">
<syntaxhighlight lang="csharp" line="GESHI_FANCY_LINE_NUMBERS">
 
        public class Dentry
        {
            public int count;
            public int line;
            public string maj;
            public int ml;
            public string phones;
            public string word;
 
            public Dentry(int count, int line, string word, string maj, int ml, string phones)
            {
                this.count = count;
                this.line = line;
                this.word = word;
                this.maj = maj;
                this.ml = ml;
                this.phones = phones;
            }
        }
 
 
        public static Dictionary<string, string> majmap = new Dictionary<string, string>() {
            { "AA", ""}, {"AA0", ""}, {"AA1", ""}, {"AA2", ""}, {"AE", ""}, {"AE0", ""},
            { "AE1",""}, {"AE2", ""}, {"AH", ""}, {"AH0", ""}, {"AH1", ""}, {"AH2", ""},
            { "AO", ""}, {"AO0", ""}, {"AO1", ""}, {"AO2", ""}, {"AW", ""}, {"AW0", ""},
            { "AW1", ""}, {"AW2",""}, {"AY", ""}, {"AY0", ""}, {"AY1", ""}, {"AY2", ""},
            { "B", "9"}, {"CH", "6"}, {"D", "1"}, {"DH", "1"}, {"EH", ""}, {"EH0", ""},
            { "EH1", ""}, {"EH2", ""}, {"ER", "4"}, {"ER0", "4"}, {"ER1", "4"},
            { "ER2", "4"}, {"EY", ""}, {"EY0", ""}, {"EY1", ""}, {"EY2", ""},
            { "F", "8"}, {"G", "7"}, {"HH", ""}, {"IH", ""}, {"IH0", ""}, {"IH1", ""},
            {"IH2",""}, {"IY", ""}, {"IY0", ""}, {"IY1", ""}, {"IY2", ""}, {"JH", "6"}, {"K", "7"},
            {"L","5"}, {"M", "3"}, {"N", "2"}, {"NG", "2"}, {"OW", ""}, {"OW0", ""}, {"OW1", ""},
            {"OW2", ""}, {"OY", ""}, {"OY0", ""}, {"OY1", ""}, {"OY2", ""}, {"P", "9"}, {"R","4"},
            {"S", "0"}, {"SH", "6"}, {"T", "1"}, {"TH", "1"}, {"UH", ""}, {"UH0", ""}, {"UH1",""},
            {"UH2", ""}, {"UW", ""}, {"UW0", ""}, {"UW1", ""}, {"UW2", ""}, {"V", "8"}, {"W",""},
            {"Y", ""}, {"Z", "0"}, {"ZH", "6"},
        };
 
</syntaxhighlight>
</div>
 
An example usage of the iterator that just stores each word entry in a list is as follows...
 
<div style ="height:200px;overflow-x:hidden;overflow-y:auto;border: 4px solid red;">
<syntaxhighlight lang="csharp" line="GESHI_FANCY_LINE_NUMBERS">
 
        public static List<CmuDict.Dentry> ReadAllCmuMajor()
        {
            CmuDict c = new CmuDict();
            var d = new List<CmuDict.Dentry>();
            c.OnEntry += (sender, x) => d.Add(x);
            c.MajorEntriesIterateAll();
            return d;
        }
 
</syntaxhighlight>
</div>
 
And here's some code to search for all words in the first 100 digits of pi...
 
<div style ="height:200px;overflow-x:hidden;overflow-y:auto;border: 4px solid orange;">
<syntaxhighlight lang="csharp" line="GESHI_FANCY_LINE_NUMBERS">
 
            log.Info("Scanning for all words in first 100 digits of pi...");
            s1 = Stopwatch.StartNew();
            // Live scan of some pi digits while we have it in memory...
            var digits = PiDigits.GetDigitsString(0, 100);
            // split up into groups of 10 digits as I don't want to have words that span these boundaries - stick in some spacing...
            digits = Regex.Replace(digits, ".{10}", "$0 ");
            int minimumDigits = 2;
            foreach(var e in d) {
                if (e.ml < minimumDigits) continue;
                var pos = digits.IndexOf(e.maj);
                if (pos < 0) continue;
                var t = digits.Substring(0, pos) + "[" + digits.Substring(pos, e.ml) + "]" + digits.Substring(pos + e.ml);
                // do something with data...
                log.Debug("MAJOR! " + e.word + " length " + e.ml + " = " + e.maj + " found in target at " + pos + "    " + t);
            }
            s1.Stop();
            log.Info("Scanned for all words in first 100 digits of pi in " + s1.Elapsed.TotalMilliseconds + " ms");
 
</syntaxhighlight>
</div>
 
This takes 79ms on my machine when processing the list in memory. When reading from the database it takes more than 12 hours! The example just writes the found words to a log file but they could be held in memory and sorted by position found, then consecutive words could be selected and spoken by TTS (eSpeak, FreeTTS, Festival, or what have you). A genericised scanner that takes a lambda expression as a callback is shown below, but this takes approximately 100 times longer to run! It does, however, allow the lambda expression to end the iteration if desired, e.g. if we have a "good enough" result...
 
<div style ="height:200px;overflow-x:hidden;overflow-y:auto;border: 4px solid pink;">
<syntaxhighlight lang="csharp" line="GESHI_FANCY_LINE_NUMBERS">
 
        private static void DigitScanner(List<CmuDict.Dentry> d, string digits, int minimumDigits, Func<CmuDict.Dentry, int, string, bool> callback)
        {
            foreach (var e in d) {
                if (e.ml < minimumDigits) continue;
                var pos = digits.IndexOf(e.maj);
                if (pos < 0) continue;
                var map = digits.Substring(0, pos) + "[" + digits.Substring(pos, e.ml) + "]" + digits.Substring(pos + e.ml);
                // do something with data...
                if(callback != null) {
                    if (!callback.Invoke(e, pos, map))
                        return;
                }
            }
        }
 
 
// ...and here is how it is called...
 
            Func< CmuDict.Dentry, int, string, bool > callback = (CmuDict.Dentry e, int pos, string map) => {
                log.Debug("MAJOR! " + e.word + " length " + e.ml + " = " + e.maj + " found in target at " + pos + "    " + map);
                return true;
            };
            DigitScanner(d, digits, minimumDigits, callback);
 
</syntaxhighlight>
</div>
 
The output in the log file is fascinating...
 
<div style ="height:200px;overflow-x:hidden;overflow-y:auto;border: 4px solid yellow;">
<syntaxhighlight lang="text" line="GESHI_FANCY_LINE_NUMBERS">
2016-02-09 11:56:43.2722 DEBUG MAJOR! DROID length 3 = 141 found in target at 0    [141]5926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DROUGHT length 3 = 141 found in target at 0    [141]5926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DRUID length 3 = 141 found in target at 0    [141]5926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUARTE length 3 = 141 found in target at 0    [141]5926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUBACH length 3 = 197 found in target at 39    1415926535 8979323846 2643383279 502884[197]1 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUBUC length 3 = 197 found in target at 39    1415926535 8979323846 2643383279 502884[197]1 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUBUQUE length 3 = 197 found in target at 39    1415926535 8979323846 2643383279 502884[197]1 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUCK'S length 3 = 170 found in target at 103    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421[170]679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUCKIES length 3 = 170 found in target at 103    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421[170]679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUCKS length 3 = 170 found in target at 103    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421[170]679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUCKS' length 3 = 170 found in target at 103    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421[170]679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUDACK length 3 = 117 found in target at 102    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 342[117]0679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUDECK length 3 = 117 found in target at 102    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 342[117]0679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUDECK'S length 4 = 1170 found in target at 102    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 342[1170]679
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUDEK length 3 = 117 found in target at 102    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 342[117]0679
</syntaxhighlight>
</div>
 
Here's a handy bit of code that collects the hits in a List of Tuples, sorts it, and spits out a CSV file for perusal...
 
<div style ="height:200px;overflow-x:hidden;overflow-y:auto;border: 4px solid teal;">
<syntaxhighlight lang="csharp" line="GESHI_FANCY_LINE_NUMBERS">
 
            log.Info("Scanning again with a collecting callback...");
            s1 = Stopwatch.StartNew();
            minimumDigits = 2;
            List<Tuple<int, CmuDict.Dentry, string>> funStuff = new List<Tuple<int, CmuDict.Dentry, string>>();
            callback = (CmuDict.Dentry e, int pos, string map) => {
                funStuff.Add(new Tuple<int, CmuDict.Dentry, string>(pos, e, map));
                return true;
            };
            DigitScanner(d, digits, minimumDigits, callback);
            s1.Stop();
            log.Info("collecting callback found "+funStuff.Count+ "words in " + s1.Elapsed.TotalMilliseconds + " ms");
 
            // sort the collection by word position...
            log.Info("Sorting collected hits by hit position...");
            s1 = Stopwatch.StartNew();
 
            funStuff.Sort((x, y) => x.Item1.CompareTo(y.Item1));
            s1.Stop();
            log.Info("sort took " + s1.Elapsed.TotalMilliseconds + " ms");
            // Write to file...
            var csvFile = "sorted_major_pi100.csv";
            log.Info("Writing results to file: " + csvFile);
            try {
                var sw = new StreamWriter(csvFile);
                sw.WriteLine("{0},{1},{2},{3},{4},{5}", "position", "word", "maj", "len", "phones", "map");
                foreach (var a in funStuff) {
                    var e = a.Item2;
                    sw.WriteLine("{0},{1},{2},{3},{4},{5}", a.Item1, e.word, e.maj, e.ml, e.phones, a.Item3);
                }
                sw.Close();
            } catch (Exception x) {
                log.Error("failed: " + x.Message);
            }
            log.Info("Finished writing results to file: " + csvFile);
</syntaxhighlight>
</div>
 
Entries can then be selected to make a fun and memorable sentence...
 
<div style ="height:200px;overflow-x:hidden;overflow-y:auto;border: 4px solid yellow;">
<syntaxhighlight lang="text" line="GESHI_FANCY_LINE_NUMBERS">
0 DROID 141 3 D R OY1 D [141]5926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
3 LEAPING 592 3 L IY1 P IH0 NG 141[592]6535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
6 SHYLY 65 2 SH AY1 L IY0 141592[65]35 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
8 HAMMILL 35 2 HH AE1 M AH0 L 14159265[35] 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
 
11 PHOBIC 897 3 F OW1 B IH0 K 1415926535 [897]9323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
14 BOMBING 932 3 B AA1 M IH0 NG 1415926535 897[932]3846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
17 MAFIA 38 2 M AA1 F IY0 AH0 1415926535 897932[38]46 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
19 ARCHWAY 46 2 AA1 R CH W EY2 1415926535 89793238[46] 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
 
22 ONSHORE 264 3 AA1 N SH AO2 R 1415926535 8979323846 [264]3383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
25 MAIM 33 2 M EY1 M 1415926535 8979323846 264[33]83279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
27 FOAMY 83 2 F OW1 M IY0 1415926535 8979323846 26433[83]279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
29 KNEECAP 279 3 N IY1 K AE2 P 1415926535 8979323846 2643383[279] 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679
</syntaxhighlight>
</div>
[[Category:Michael's Projects]]

Latest revision as of 06:05, 10 February 2019

Memorising Pi
[[{{{image}}}|border|frameless|220px|center]]
Primary Contact Michael Erskine
Created 08/02/2016
Completed {{{completeddate}}}
Dormant {{{dormantdate}}}
Version {{{version}}}
Members {{{members}}}
Manufacturer {{{manufacturer}}}
Model {{{model}}}
Location [[{{{location}}}]]
GitHub / Repo {{{repo}}}
Status In Progress
Type Undefined
Live Status {{{livestatus}}}
QR code

On January 5th 2016 I set myself a goal to memorise Pi to 100 decimal places. My motivation for this was to learn the [Major Mnemonic System] and it's also a personal challenge. Prior to mid December 2015 I knew the five digits that I had learned at school: 3.14159. This had been sufficient for rough calculations with a calculator (for mental arithmetic I use the approximation of 22/7) and for any accuracy and programming I would use constants provided by the language or software. I had previously looked at the Major system but was unable to apply it due to lack of practice. So in December whilst discussing pi with one of my daughter's friends, she was able to recite 10 decimal places (3.1415926535) which I found to be quite impressive. So in the following weeks I learned the next 5 digits I needed to get up to 10 but without using a particular established method; just repetition. A short while later, at the start of January, I had a discussion with a friend at the Hackspace about memory sports and I decided to learn Major system properly with a well-defined goal: to be able to recite pi up to the 100th decimal place. Once I applied myself to the task I quickly exceeded to 100 digits and by the end of January I was up to 300. I set a further goal to memorise 1000 digits in time for World Pi Day 2016.

My memorisation process is really just an exercise in creative writing! I like to work in groups of 10 digits at a time. Each group is remembered as a sentence of 3 to 5 words and each word encodes perhaps 2 to 5 digits. The sentences are not proper English though; it would be way too difficult to try and find the right words in the Major system. The sentences are just keywords and carry a general meaning as part of my overall story. The theme of my story is a large battle set on a Shell World from a science fiction novel by Iain M Banks. In order to be memorable, the story contains peril, includes all the senses, and has remarkable and bizarre happenings featuring familiar people and objects from my everyday life.

--Michael Erskine (talk) 11:56, 8 February 2016 (UTC)

Major System Software

I found that although there were many online and offline software tools for the Major system, none of them fit nicely with my requirements. I started to work on my own software to find handy mnemonics from the digit sequences. Because the major system is based on pronunciation and not spelling, I started looking into ways of encoding sounds with systems like the International Phonetic Alphabet (IPA). Again, there are many online resources for phonetics but I wanted to write my own software (as usual!). I had previously worked with speech recognition for the Cheesioid project and the CMU Sphinx software; what I was looking for was a phonetic database of British English words that could be programmatically mapped to number sequences. The CMU Sphinx libraries make use of such a dictionary, known as the CMU Pronouncing Dictionary, which uses the ARPABET ASCII notation for the encoding of phonemes.

Here are the valid ARPABET phonemes with examples: -

    //  Phoneme Example Translation
    //  ------- ------- -----------
    //  AA      odd     AA D
    //  AE      at      AE T
    //  AH      hut     HH AH T
    //  AO      ought   AO T
    //  AW      cow     K AW
    //  AY      hide    HH AY D
    //  B       be      B IY
    //  CH      cheese  CH IY Z
    //  D       dee     D IY
    //  DH      thee    DH IY
    //  EH      Ed      EH D
    //  ER      hurt    HH ER T
    //  EY      ate     EY T
    //  F       fee     F IY
    //  G       green   G R IY N
    //  HH      he      HH IY
    //  IH      it      IH T
    //  IY      eat     IY T
    //  JH      gee     JH IY
    //  K       key     K IY
    //  L       lee     L IY
    //  M       me      M IY
    //  N       knee    N IY
    //  NG      ping    P IH NG
    //  OW      oat     OW T
    //  OY      toy     T OY
    //  P       pee     P IY
    //  R       read    R IY D
    //  S       sea     S IY
    //  SH      she     SH IY
    //  T       tea     T IY
    //  TH      theta   TH EY T AH
    //  UH      hood    HH UH D
    //  UW      two     T UW
    //  V       vee     V IY
    //  W       we      W IY
    //  Y       yield   Y IY L D
    //  Z       zee     Z IY
    //  ZH      seizure S IY ZH ER

APRABET phonemes are modified into "symbols" by adding optional digits to indicate emphasis within the word. The full set of possible ARPABET symbols used by CMU Sphinx dictionaries is provided in the CMU Sphinx downloads as file "cmudict-0.7b.symbols". This is a reasonably small set (84 symbols) and I was quickly able to create the following simple mapping of all possible APRABET symbols to the equivalent major digits by ignoring emphasis, vowels and other elements silent to Major system.

symbol major notes
AA vowels not encoded
AA0
AA1
AA2
AE
AE0
AE1
AE2
AH
AH0
AH1
AH2
AO
AO0
AO1
AO2
AW
AW0
AW1
AW2
AY
AY0
AY1
AY2
B 9
CH 6
D 1
DH 1
EH
EH0
EH1
EH2
ER 4
ER0 4
ER1 4
ER2 4
EY
EY0
EY1
EY2
F 8
G 7
HH
IH
IH0
IH1
IH2
IY
IY0
IY1
IY2
JH 6
K 7
L 5
M 3
N 2
NG 2
OW
OW0
OW1
OW2
OY
OY0
OY1
OY2
P 9
R 4
S 0
SH 6
T 1
TH 1
UH
UH0
UH1
UH2
UW
UW0
UW1
UW2
V 8
W
Y
Z 0
ZH 6

Armed with this table I was able to encode the entire CMU English dictionary (cmudict-0.7b) of some 133000 words to Major system equivalents. I don't know if this has ever been done before but I am happy to share my work! --Michael Erskine (talk) 12:16, 8 February 2016 (UTC)

Software to process the CMU Dictionary

I have written some software that reads the CMU Sphinx dictionary in ARPABET format and produces a database table that attaches Major encodings to each word.

CREATE TABLE `cmudict_07b_words` (
	`counter` INT(11) NULL DEFAULT NULL,
	`line` INT(11) NULL DEFAULT NULL,
	`word` VARCHAR(50) NULL DEFAULT NULL,
	`maj` VARCHAR(50) NULL DEFAULT NULL,
	`len` INT(11) NULL DEFAULT NULL,
	`phones` VARCHAR(100) NULL DEFAULT NULL
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;

A database is nice but it is much, much faster to slurp the whole CMU dictionary from the original text file and process it in memory. Here's an example of C# code to read the CMU dictionary into a list of objects (one-per-word) that are the same as the database above, with major translations. This is part of a class named "CmuDict". By the way: the software is in C# but not because the language is particularly suited to the task: it's just what I happen to be using right now. Any expressive language with closures/lambdas/functors will do the job (Scheme, Java, Perl, ML, Haskell, etc.)

        public event EventHandler<Dentry> OnEntry;

        /// <summary>
        /// Read and iterate all CMU dictionary entries and raise event for each entry
        /// </summary>
        public void MajorEntriesIterateAll()
        {
            try {
                using (StringReader r = new StringReader(gpsd_tests.Properties.Resources.cmudict_0_7b)) {
                    int line = 0;
                    int count = 0;
                    int problem_entry = -1;
                    string s;
                    StringBuilder sb = new StringBuilder();
                    while ((s = r.ReadLine()) != null) {
                        line++;
                        if (s.StartsWith(";;;"))
                            continue;
                        var ix = s.IndexOf("  ");
                        if (ix < 0) {
                            problem_entry = line;
                            continue;
                        }
                        if (!Char.IsLetter(s[0]))
                            continue;

                        var word = s.Substring(0, ix);
                        var phones = s.Substring(ix + 2);
                        var sa = phones.Split(' ');
                        sb.Clear();
                        foreach (var p in sa) {
                            string m = majmap[p];
                            sb.Append(m);
                        }
                        var maj = sb.ToString();
                        int ml = maj.Length;
                        count++;
                        if(OnEntry != null) {
                            OnEntry(this, new Dentry(count, line, word, maj, ml, phones));
                        }
                    }
                    log.Info("entries in CMU dictionary: " + line);
                    if (problem_entry != -1) {
                        log.Info("problem entry at " + problem_entry);
                    }
                }
            } catch (Exception x) {
                log.Error("failed: " + x.Message);
            }
        }

The simple "Dentry" class and "majmap" hashtables are as follows...

        public class Dentry
        {
            public int count;
            public int line;
            public string maj;
            public int ml;
            public string phones;
            public string word;

            public Dentry(int count, int line, string word, string maj, int ml, string phones)
            {
                this.count = count;
                this.line = line;
                this.word = word;
                this.maj = maj;
                this.ml = ml;
                this.phones = phones;
            }
        }


        public static Dictionary<string, string> majmap = new Dictionary<string, string>() {
            { "AA", ""}, {"AA0", ""}, {"AA1", ""}, {"AA2", ""}, {"AE", ""}, {"AE0", ""},
            { "AE1",""}, {"AE2", ""}, {"AH", ""}, {"AH0", ""}, {"AH1", ""}, {"AH2", ""},
            { "AO", ""}, {"AO0", ""}, {"AO1", ""}, {"AO2", ""}, {"AW", ""}, {"AW0", ""},
            { "AW1", ""}, {"AW2",""}, {"AY", ""}, {"AY0", ""}, {"AY1", ""}, {"AY2", ""},
            { "B", "9"}, {"CH", "6"}, {"D", "1"}, {"DH", "1"}, {"EH", ""}, {"EH0", ""},
            { "EH1", ""}, {"EH2", ""}, {"ER", "4"}, {"ER0", "4"}, {"ER1", "4"},
            { "ER2", "4"}, {"EY", ""}, {"EY0", ""}, {"EY1", ""}, {"EY2", ""},
            { "F", "8"}, {"G", "7"}, {"HH", ""}, {"IH", ""}, {"IH0", ""}, {"IH1", ""},
            {"IH2",""}, {"IY", ""}, {"IY0", ""}, {"IY1", ""}, {"IY2", ""}, {"JH", "6"}, {"K", "7"},
            {"L","5"}, {"M", "3"}, {"N", "2"}, {"NG", "2"}, {"OW", ""}, {"OW0", ""}, {"OW1", ""},
            {"OW2", ""}, {"OY", ""}, {"OY0", ""}, {"OY1", ""}, {"OY2", ""}, {"P", "9"}, {"R","4"},
            {"S", "0"}, {"SH", "6"}, {"T", "1"}, {"TH", "1"}, {"UH", ""}, {"UH0", ""}, {"UH1",""},
            {"UH2", ""}, {"UW", ""}, {"UW0", ""}, {"UW1", ""}, {"UW2", ""}, {"V", "8"}, {"W",""},
            {"Y", ""}, {"Z", "0"}, {"ZH", "6"},
        };

An example usage of the iterator that just stores each word entry in a list is as follows...

        public static List<CmuDict.Dentry> ReadAllCmuMajor()
        {
            CmuDict c = new CmuDict();
            var d = new List<CmuDict.Dentry>();
            c.OnEntry += (sender, x) => d.Add(x);
            c.MajorEntriesIterateAll();
            return d;
        }

And here's some code to search for all words in the first 100 digits of pi...

            log.Info("Scanning for all words in first 100 digits of pi...");
            s1 = Stopwatch.StartNew();
            // Live scan of some pi digits while we have it in memory...
            var digits = PiDigits.GetDigitsString(0, 100);
            // split up into groups of 10 digits as I don't want to have words that span these boundaries - stick in some spacing...
            digits = Regex.Replace(digits, ".{10}", "$0 ");
            int minimumDigits = 2;
            foreach(var e in d) {
                if (e.ml < minimumDigits) continue;
                var pos = digits.IndexOf(e.maj);
                if (pos < 0) continue;
                var t = digits.Substring(0, pos) + "[" + digits.Substring(pos, e.ml) + "]" + digits.Substring(pos + e.ml);
                // do something with data...
                log.Debug("MAJOR! " + e.word + " length " + e.ml + " = " + e.maj + " found in target at " + pos + "    " + t);
            }
            s1.Stop();
            log.Info("Scanned for all words in first 100 digits of pi in " + s1.Elapsed.TotalMilliseconds + " ms");

This takes 79ms on my machine when processing the list in memory. When reading from the database it takes more than 12 hours! The example just writes the found words to a log file but they could be held in memory and sorted by position found, then consecutive words could be selected and spoken by TTS (eSpeak, FreeTTS, Festival, or what have you). A genericised scanner that takes a lambda expression as a callback is shown below, but this takes approximately 100 times longer to run! It does, however, allow the lambda expression to end the iteration if desired, e.g. if we have a "good enough" result...

        private static void DigitScanner(List<CmuDict.Dentry> d, string digits, int minimumDigits, Func<CmuDict.Dentry, int, string, bool> callback)
        {
            foreach (var e in d) {
                if (e.ml < minimumDigits) continue;
                var pos = digits.IndexOf(e.maj);
                if (pos < 0) continue;
                var map = digits.Substring(0, pos) + "[" + digits.Substring(pos, e.ml) + "]" + digits.Substring(pos + e.ml);
                // do something with data...
                if(callback != null) {
                    if (!callback.Invoke(e, pos, map))
                        return;
                }
            }
        }


// ...and here is how it is called...

            Func< CmuDict.Dentry, int, string, bool > callback = (CmuDict.Dentry e, int pos, string map) => {
                log.Debug("MAJOR! " + e.word + " length " + e.ml + " = " + e.maj + " found in target at " + pos + "    " + map);
                return true;
            };
            DigitScanner(d, digits, minimumDigits, callback);

The output in the log file is fascinating...

2016-02-09 11:56:43.2722 DEBUG MAJOR! DROID length 3 = 141 found in target at 0    [141]5926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DROUGHT length 3 = 141 found in target at 0    [141]5926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DRUID length 3 = 141 found in target at 0    [141]5926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUARTE length 3 = 141 found in target at 0    [141]5926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUBACH length 3 = 197 found in target at 39    1415926535 8979323846 2643383279 502884[197]1 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUBUC length 3 = 197 found in target at 39    1415926535 8979323846 2643383279 502884[197]1 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUBUQUE length 3 = 197 found in target at 39    1415926535 8979323846 2643383279 502884[197]1 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUCK'S length 3 = 170 found in target at 103    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421[170]679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUCKIES length 3 = 170 found in target at 103    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421[170]679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUCKS length 3 = 170 found in target at 103    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421[170]679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUCKS' length 3 = 170 found in target at 103    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421[170]679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUDACK length 3 = 117 found in target at 102    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 342[117]0679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUDECK length 3 = 117 found in target at 102    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 342[117]0679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUDECK'S length 4 = 1170 found in target at 102    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 342[1170]679 
2016-02-09 11:56:43.2722 DEBUG MAJOR! DUDEK length 3 = 117 found in target at 102    1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 342[117]0679

Here's a handy bit of code that collects the hits in a List of Tuples, sorts it, and spits out a CSV file for perusal...

            log.Info("Scanning again with a collecting callback...");
            s1 = Stopwatch.StartNew();
            minimumDigits = 2;
            List<Tuple<int, CmuDict.Dentry, string>> funStuff = new List<Tuple<int, CmuDict.Dentry, string>>();
            callback = (CmuDict.Dentry e, int pos, string map) => {
                funStuff.Add(new Tuple<int, CmuDict.Dentry, string>(pos, e, map));
                return true;
            };
            DigitScanner(d, digits, minimumDigits, callback);
            s1.Stop();
            log.Info("collecting callback found "+funStuff.Count+ "words in " + s1.Elapsed.TotalMilliseconds + " ms");

            // sort the collection by word position...
            log.Info("Sorting collected hits by hit position...");
            s1 = Stopwatch.StartNew();

            funStuff.Sort((x, y) => x.Item1.CompareTo(y.Item1));
            s1.Stop();
            log.Info("sort took " + s1.Elapsed.TotalMilliseconds + " ms");
            // Write to file...
            var csvFile = "sorted_major_pi100.csv";
            log.Info("Writing results to file: " + csvFile);
            try {
                var sw = new StreamWriter(csvFile);
                sw.WriteLine("{0},{1},{2},{3},{4},{5}", "position", "word", "maj", "len", "phones", "map");
                foreach (var a in funStuff) {
                    var e = a.Item2;
                    sw.WriteLine("{0},{1},{2},{3},{4},{5}", a.Item1, e.word, e.maj, e.ml, e.phones, a.Item3);
                }
                sw.Close();
            } catch (Exception x) {
                log.Error("failed: " + x.Message);
            }
            log.Info("Finished writing results to file: " + csvFile);

Entries can then be selected to make a fun and memorable sentence...

0	DROID	141	3	D R OY1 D	[141]5926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
3	LEAPING	592	3	L IY1 P IH0 NG	141[592]6535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
6	SHYLY	65	2	SH AY1 L IY0	141592[65]35 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
8	HAMMILL	35	2	HH AE1 M AH0 L	14159265[35] 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 

11	PHOBIC	897	3	F OW1 B IH0 K	1415926535 [897]9323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
14	BOMBING	932	3	B AA1 M IH0 NG	1415926535 897[932]3846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
17	MAFIA	38	2	M AA1 F IY0 AH0	1415926535 897932[38]46 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
19	ARCHWAY	46	2	AA1 R CH W EY2	1415926535 89793238[46] 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 

22	ONSHORE	264	3	AA1 N SH AO2 R	1415926535 8979323846 [264]3383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
25	MAIM	33	2	M EY1 M	1415926535 8979323846 264[33]83279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
27	FOAMY	83	2	F OW1 M IY0	1415926535 8979323846 26433[83]279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 
29	KNEECAP	279	3	N IY1 K AE2 P	1415926535 8979323846 2643383[279] 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679