Organization Name: Pharo Consortium

Contributor: Neerja Doshi

Mentors: Domenico Cipriani, Nahuel Palumbo

Project Repository: PAM-GSoC-25-Project

Project Duration: May 9, 2025 – September 1, 2025

My Deliverables 😎

TTS conversion code snippet and its Transcript (Logs)

Plug in these Code Snippets in Playground to hear PAM speak ! ❤️


        | dsp |
        
        "initliase the DSP class"
        dsp := PAMDsp new.
        
        "audio for a array of numbers"
        dsp sayNumbers: #(5 6 7 2).
        
        "audio for all numbers from 0 to 9"
        dsp sayNumbers0to9.
        
        "audio for sentence"
        dsp sayText: 'Dogs'.
        
        "audio for sentence with prosody and child preset"
        dsp sayTextWithProsody: 'Dogs' asChild: true.
        
        "audio for sentence with prosody and adult preset"
        dsp sayTextWithProsody: 'golf day!' asChild: false.
        
        "Transcript for logs"
        Transcript open.

💡 Head over to PAM-Core >> PAMDspExamples >> class-side → click the notepad icon next to
sayNumbers0to9, sayText:, sayTextWithProsody:asChild to hear them speak! (A quick and easy way to test TTS 🎙️)

Project Abstract 😎

PAM (Pharo Automated Mouth) is a rule-based Text-to-Speech (TTS) system developed for the Pharo environment. Unlike modern deep-learning TTS systems that require extensive datasets and computational resources, PAM employs a lightweight rule-based approach inspired by the classic SAM (Software Automatic Mouth) system.

The system implements text-to-phoneme conversion using 402 English → Grapheme rules and 47 Grapheme → IPA phoneme rules, producing accurate IPA phoneme sequences with stress markers.

For synthesis, PAM adopts a concatenative sample-based approach: instead of generating audio purely algorithmically, it plays back pre-recorded audio samples of phonemes. These are sequenced and stitched together through a TpSampler-driven DSP pipeline (integrated with Phausto), enabling intelligible speech output.

On top of this, PAM introduces prosody control, allowing each phoneme playback to be modulated by parameters such as pitch (adult/child voice presets), amplitude (stress-based loudness), and duration (temporal stretching). This hybrid design — rule-based phoneme generation + sample-based concatenative synthesis + prosodic control — makes PAM lightweight and customizable while still producing natural-sounding speech.

Core Architecture

PAM consists of four main components:

Reciter Engine: Class Reciter → Converts English text to Graphemes [ SAM Phoenemes ] using pattern-matching rules. The rules are adapted from SAM.

IPA Converter: Class PhonemeToIPAConvertor → Transforms graphemes into International Phonetic Alphabet (IPA) format of phonemes.

Audio Synthesis: Class PAMDsp → Leverages Phausto’s DSP capabilities for speech generation using pre-recorded audio samples of phonemes.

Prosody Generation: Class Parser → Gives prosody to PAM by tweaking frequency, amplitude, length dynamics, and generating age-like voice variations.

Major Accomplishments ❤️

1. Complete Rule Engine Implementation (June–July 2025)

Duration: ~6 weeks

Achievement: Successfully implemented all 26 letter rules with comprehensive test coverage.

Created hierarchical rule system with parent LetterRules class and 26 child classes (A–Z)
Total rule count: 402
Implemented special character matching for prefix/suffix contexts (#, ^, @, %)
Added comprehensive test suite covering all alphabetic conversions


      "Gasoline test"
      
      | phonemeOutput inputText |
      inputText := 'Gasoline'.
      
      phonemeOutput := Reciter textToPhonemes: inputText.
      
      self assert: phonemeOutput equals: #('G' 'EY4S' 'OW' 'L' 'IH' 'N' 'EH').

Test Results: All English words can be correctly converted to phonemes like Gasoline → ('G' 'EY4S' 'OW' 'L' 'IH' 'N' 'EH').

2. IPA Phoneme Conversion System (July–August 2025)

Duration: ~3 weeks

Achievement: Developed recursive SAM Phoneme (Grapheme) to IPA Phoeneme converter for audio file mapping.

Pseudocode for Recursive compound phoneme splitting


      splitCompoundPhoneme: 'RIY'
          → splits to: #('R' 'IY')  
          → converts to: #('r' 'i_colon')

3. Audio Synthesis Integration (August 2025)

Duration: ~2 weeks

Achievement: Complete integration with Phausto DSP system.

PAMDsp Class Features:


      "Complete TTS with prosody"
      dsp := PAMDsp new.
      dsp sayTextWithProsody: 'hello' asChild: false.
      
      "Number sequence generation"
      dsp sayNumbers: #(1 3 5 7).

Audio Parameters Implemented:

Pitch Control: male: 275Hz, child: 500Hz
Amplitude Control: 0.5–1.0 range with stress-based modulation
Duration Control: 0.1–0.4 seconds based on phoneme stress values

Prosody Formula:


          durationForStress: stress  
              "Map stress in range [4..6] with duration in range [0.1..0.4].
          
                  ----- INDEX ----
                   4 = short/fast (0.1s), 
                   6 = long/slow (0.4s)."
              ^ 0.1 + (((stress - 4) / 2.0) * 0.3)
          
          
          amplitudeForStress: stress  
              "Map stress in range [4..6] with amplitude in range [0.5..1.0].
          
                  ----- INDEX ----
                   4 = soft (0.5), 
                   6 = strong (1.0)."
              ^ 0.5 + (((stress - 4) / 2.0) * 0.5)

4. Package Management & Distribution

Achievement: Created production-ready Pharo package with Metacello integration.

One-line installation command


      Metacello new
          baseline: 'PAM';
          repository: 'github://neerja-1984/PAM-GSoC-25-Project:master/src';
          onConflict: [ :ex | ex useIncoming ];
          onUpgrade: [ :ex | ex useIncoming ];
          load.

📂 Extract the below and place the extract in Documents folder

⬇️ Download Audio Samples

Folder Structure should be as below

          Documents/
          ├── phonemes
          └── numbers

Refer the following in case Doubts

Technical Challenges Overcome 😎

1. Rule Ordering Bug Resolution


                
    1.    self addRule: '(BREAK)' replacement: 'BREY5K'.

    2.    self addRule: '(B)' replacement: 'B'

Problem: Multiple rules matched the same input, leading to incorrect phoneme selection for word "BREAK".

Solution: Hence, first match all rules -> amongst all selected rules --> choose the one of longest length.

Impact: Ensured accurate English text-to-phoneme conversion.

2. DSP Stereo Bug


            sampler := TpSampler new.
            sampler label: aLabel.
            
            samplePlayer := sampler pathToFolder: folderPath.
            
            "problem inducing line"
            dspInstance := samplePlayer stereo asDsp.

Problem: Our audio files were a mix of both Mono and Stereo.Hence, some audio files couldn't be heard properly

Solution: Developed FFmpeg + helper scripts to convert mono audio files to stereo.

Impact: Reliable playback for all phonemes, standardized pipeline.

3. Phoneme File Indexing Bug


          myString := Reciter textToIPAPhonemes: 'book'.
          "myString = #('b' 'ʊ' 'k')"

          "list -> our audio files (presumably sorted by OS)"
          "for each character of myString -> find indexOf character from the list"
          "can be an issue: Files sorted by OS are platform-dependent and differ from how Phausto sorts them"
          
          [myString do: [ :i |  
              dsp setValue: ( list indexOf: i ) parameter: 'PAMSamplerIndex'.
              dsp trig: 'PAMSamplerGate'.
              0.2 seconds wait
          ]] fork.

Image 1 is how phausto sorts them, Image 2 is how Windows Filesystem sorts them

Problem: Phausto expects files to be in a sorted order, as it uses the index to map the phoneme to the audio file as seen in the above code snippet.

Solution: Added renaming script to rename all files as number_phonemeName.wav.

Impact: Files are now sorted according to how Phauto sorts them. Files are now renamed as : 001_a_colon.wav, 002_aɪ.wav, 003_aʊ.wav,

4. Cross-Platform Path Handling


  folderPath := FileLocator documents / aFolderName.

Problem: Windows/macOS path separators differ, causing issues with folders of numbers and phonemes stored in Documents (e.g., C:\Users<your-name>\Documents\numbers).

Solution: Implemented FileLocator-based path resolution to generate OS-independent paths.

Impact: Seamless cross-platform deployment; generalized way to configure paths for any OS.

5. Audio Metadata Issue

Problem: Phausto (via libsndfile) cannot properly read audio files containing metadata tags.

Solution: Stripped metadata from all phoneme audio files using online-audio-converter.com. Confirmed playback success after conversion.

Timeline Chart of PAM Development ❤️

📚

🔬

Community Bonding & SAM Deep Dive

May 2025

Initial mentorship meetings with Domenico & Nahuel to establish project foundation. Deep research into SAM's 450 phonetic rules and understanding reciter logic. Set up development environment with Pharo & Iceberg, created Git repository for project 😊

SAM Research Rules Logic Mentor Meetings Project Setup

⚙️

🧠

Pharo Foundation & Constants Implementation

June 2025

Successfully converted SAM constants and character flags to PAM, made succesfull tests suites for it. Understood class-side vs instance-side vs class-instance side variables. Learned advanced Pharo debugging techniques.

Constants Migration Test Driven Development Character Flags Pharo Debugging

🎯

📝

Rules Engine & Pattern Matching Logic

Mid-June 2025

Extended rules for all alphabets. Developed a OOPS based Design pattern for Letter Rules. Implemented complex rule parsing system with prefix-pattern-suffix matching.Created comprehensive letter-specific rule dictionaries, and achieved working textToPhoneme conversion for basic words like "HELLO" and "COLLEGE".

Pattern Matching Rules Engine OOPS Design Pattern Text Processing

🐛

🔧

Major Bug Fixes & Logger Implementation

Late June 2025

Fixed critical sentence parsing issues and wrong phoneme generation caused by rule ordering problems. Implemented SpringBoot-style logger system of [CLASS NAME -> METHOD NAME -> LOG MESSAGE ], resolved space handling in textToPhoneme, and achieved green test status for all alphabet rules with proper pattern priority matching.

Bug Resolution Logger System Rule Prioritization Green Tests

🎵

🔊

First Audio Success with Phausto Integration

July 2025

Successfully integrated TPSampler and DSP for audio playback for numbers audio-samples. Implemented number audio samples (0-9), resolved Windows/Mac path separator issues, and created first working audio output. Added pragma annotations and fork-based asynchronous audio processing.

Phausto Integration Audio Processing TPSampler Cross-platform Async Processing

🌟

🎤

Phoneme-to-IPA Conversion & Speaking Words

Late July 2025

Developed sophisticated phoneme-to-IPA converter using recursive greedy approach[ longest matching IPA phoneme first ]. Created comprehensive mapping dictionary, implemented compound phoneme splitting logic, and achieved first successful word pronunciation ("DOG") .. although veryy robotic . Established sorted audio file indexing system for accurate phoneme playback.

IPA Conversion Recursive Algorithm Speaking Words Audio Indexing Phoneme Mapping

🎭

🎛️

Prosody & Voice Synthesis Mastery

Mid-August 2025

Implemented advanced prosody controls with stress-based amplitude and duration calculations. Added age-based voice presets (adult vs child), developed comprehensive audio parameter tuning (pitch, volume, note stretching), and converted mono audio samples to stereo for proper playback compatibility.

Prosody Control Voice Synthesis Age-based Voices Audio Parameters Mono to Stereo Conversion

📦

⚡

Metacello Baseline & Project Optimization

Late August 2025

Created comprehensive Metacello baseline for seamless project installation. Resolved dependency conflicts, implemented proper package management, and achieved successful deployment in fresh Pharo images. All tests green with automated dependency resolution and conflict handling.

Metacello Package Management Deployment Dependency Resolution Project Distribution

🎉

🏆

Final Integration & GSoC Success

August 2025

✨101st Commit Achievement ✨

Completed full text-to-speech pipeline with prosodic control! Final PAMDsp implementation supports speech generation with age-based voice characteristics and stress-sensitive pronunciation !! 😎

Complete TTS Pipeline Concatenative Synthesis based TTS model Natural Speech GSoC Success Production Ready

Code Quality & Testing ❤️

Comprehensive Test Suite

86 unit tests covering all conversion scenarios
Green test status across full alphabet

Clean Architecture

Object-oriented design with clear separation of concerns
Utility classes for logging, path handling, and audio management

Performance Metrics

Metric	Value
Package Memory Consumption	< 1 GB [Package size]
Rule Database [English letters → Graphene]	402 rules across 26 letters
Rule Database [SAM Phoneme (Grapheme) → IPA phoneme]	47 rules
Audio Sample Count	44 IPA phonemes + 10 numbers
Package Load Time	<1 Minute

Contribution Summary

Total Commits: 109
Lines of Code: 5053 (as of writing this document)
Project Status: Production ready, with Metacello script, full documentation and test coverage

Future Development Opportunities 😎

Enhanced Prosody: Implement formant synthesis for more natural speech patterns.

Voice Customization: Additional gender voice presets.

Real-time Processing: Streaming audio generation for long texts.

Acknowledgements ❤️

Special thanks to my mentors Domenico Cipriani and Nahuel Palumbo for all their guidance. Starting from Pharo architecture, DSP integration, software engineering best practices to Debugging sessions we've had. PAM wouldn't have reached this stage if it wasn't for their guidance. Their expertise in both linguistic processing and audio synthesis was instrumental in achieving the project goals.