


Organization Name: Pharo Consortium
Contributor: Neerja Doshi
Mentors: Domenico Cipriani, Nahuel Palumbo
Project Repository: PAM-GSoC-25-Project
Project Duration: May 9, 2025 β September 1, 2025
My Deliverables π
TTS conversion code snippet and its Transcript (Logs)

Plug in these Code Snippets in Playground to hear PAM speak ! β€οΈ
| dsp |
"initliase the DSP class"
dsp := PAMDsp new.
"audio for a array of numbers"
dsp sayNumbers: #(5 6 7 2).
"audio for all numbers from 0 to 9"
dsp sayNumbers0to9.
"audio for sentence"
dsp sayText: 'Dogs'.
"audio for sentence with prosody and child preset"
dsp sayTextWithProsody: 'Dogs' asChild: true.
"audio for sentence with prosody and adult preset"
dsp sayTextWithProsody: 'golf day!' asChild: false.
"Transcript for logs"
Transcript open.
sayNumbers0to9, sayText:, sayTextWithProsody:asChild to hear them speak! (A quick and easy way to test TTS ποΈ)
Project Abstract π
PAM (Pharo Automated Mouth) is a rule-based Text-to-Speech (TTS) system developed for the Pharo environment. Unlike modern deep-learning TTS systems that require extensive datasets and computational resources, PAM employs a lightweight rule-based approach inspired by the classic SAM (Software Automatic Mouth) system.
The system implements text-to-phoneme conversion using 402 English β Grapheme rules and 47 Grapheme β IPA phoneme rules, producing accurate IPA phoneme sequences with stress markers.
For synthesis, PAM adopts a concatenative sample-based approach: instead of generating audio purely algorithmically, it plays back pre-recorded audio samples of phonemes. These are sequenced and stitched together through a TpSampler-driven DSP pipeline (integrated with Phausto), enabling intelligible speech output.
On top of this, PAM introduces prosody control, allowing each phoneme playback to be modulated by parameters such as pitch (adult/child voice presets), amplitude (stress-based loudness), and duration (temporal stretching). This hybrid design β rule-based phoneme generation + sample-based concatenative synthesis + prosodic control β makes PAM lightweight and customizable while still producing natural-sounding speech.
Core Architecture
PAM consists of four main components:
Reciter Engine: Class Reciter
β Converts English text to Graphemes [ SAM Phoenemes ] using
pattern-matching rules. The rules are adapted from SAM.
IPA Converter: Class PhonemeToIPAConvertor
β Transforms graphemes into
International Phonetic Alphabet (IPA) format of phonemes.
Audio Synthesis: Class PAMDsp
β Leverages Phaustoβs DSP capabilities for speech
generation using pre-recorded audio samples of phonemes.
Prosody Generation: Class Parser
β Gives prosody to PAM by tweaking frequency,
amplitude, length dynamics, and generating age-like voice variations.
Major Accomplishments β€οΈ
1. Complete Rule Engine Implementation (JuneβJuly 2025)
Duration: ~6 weeks
Achievement: Successfully implemented all 26 letter rules with comprehensive test coverage.
- Created hierarchical rule system with parent LetterRules class and 26 child classes (AβZ)
- Total rule count: 402
- Implemented special character matching for prefix/suffix contexts (
#
,^
,@
,%
) - Added comprehensive test suite covering all alphabetic conversions
"Gasoline test"
| phonemeOutput inputText |
inputText := 'Gasoline'.
phonemeOutput := Reciter textToPhonemes: inputText.
self assert: phonemeOutput equals: #('G' 'EY4S' 'OW' 'L' 'IH' 'N' 'EH').
Test Results: All English words can be correctly converted to phonemes like
Gasoline β ('G' 'EY4S' 'OW' 'L' 'IH' 'N' 'EH')
.
2. IPA Phoneme Conversion System (JulyβAugust 2025)
Duration: ~3 weeks
Achievement: Developed recursive SAM Phoneme (Grapheme) to IPA Phoeneme converter for audio file mapping.
Pseudocode for Recursive compound phoneme splitting
splitCompoundPhoneme: 'RIY'
β splits to: #('R' 'IY')
β converts to: #('r' 'i_colon')
3. Audio Synthesis Integration (August 2025)
Duration: ~2 weeks
Achievement: Complete integration with Phausto DSP system.
PAMDsp Class Features:
"Complete TTS with prosody"
dsp := PAMDsp new.
dsp sayTextWithProsody: 'hello' asChild: false.
"Number sequence generation"
dsp sayNumbers: #(1 3 5 7).
Audio Parameters Implemented:
- Pitch Control: male: 275Hz, child: 500Hz
- Amplitude Control: 0.5β1.0 range with stress-based modulation
- Duration Control: 0.1β0.4 seconds based on phoneme stress values
Prosody Formula:
durationForStress: stress
"Map stress in range [4..6] with duration in range [0.1..0.4].
----- INDEX ----
4 = short/fast (0.1s),
6 = long/slow (0.4s)."
^ 0.1 + (((stress - 4) / 2.0) * 0.3)
amplitudeForStress: stress
"Map stress in range [4..6] with amplitude in range [0.5..1.0].
----- INDEX ----
4 = soft (0.5),
6 = strong (1.0)."
^ 0.5 + (((stress - 4) / 2.0) * 0.5)
4. Package Management & Distribution
Achievement: Created production-ready Pharo package with Metacello integration.
One-line installation command
Metacello new
baseline: 'PAM';
repository: 'github://neerja-1984/PAM-GSoC-25-Project:master/src';
onConflict: [ :ex | ex useIncoming ];
onUpgrade: [ :ex | ex useIncoming ];
load.
π Extract the below and place the extract in Documents folder
β¬οΈ Download Audio SamplesFolder Structure should be as below
Documents/ βββ phonemes βββ numbers
Technical Challenges Overcome π
1. Rule Ordering Bug Resolution
1. self addRule: '(BREAK)' replacement: 'BREY5K'.
2. self addRule: '(B)' replacement: 'B'
Problem: Multiple rules matched the same input, leading to incorrect phoneme selection for word "BREAK".
Solution: Hence, first match all rules -> amongst all selected rules --> choose the one of longest length.
Impact: Ensured accurate English text-to-phoneme conversion.
2. DSP Stereo Bug
sampler := TpSampler new.
sampler label: aLabel.
samplePlayer := sampler pathToFolder: folderPath.
"problem inducing line"
dspInstance := samplePlayer stereo asDsp.
Problem: Our audio files were a mix of both Mono and Stereo.Hence, some audio files couldn't be heard properly
Solution: Developed FFmpeg + helper scripts to convert mono audio files to stereo.
Impact: Reliable playback for all phonemes, standardized pipeline.
3. Phoneme File Indexing Bug
myString := Reciter textToIPAPhonemes: 'book'.
"myString = #('b' 'Κ' 'k')"
"list -> our audio files (presumably sorted by OS)"
"for each character of myString -> find indexOf character from the list"
"can be an issue: Files sorted by OS are platform-dependent and differ from how Phausto sorts them"
[myString do: [ :i |
dsp setValue: ( list indexOf: i ) parameter: 'PAMSamplerIndex'.
dsp trig: 'PAMSamplerGate'.
0.2 seconds wait
]] fork.
Image 1 is how phausto sorts them, Image 2 is how Windows Filesystem sorts them


Problem: Phausto expects files to be in a sorted order, as it uses the index to map the phoneme to the audio file as seen in the above code snippet.
Solution: Added renaming script to rename all files as
number_phonemeName.wav
.
Impact: Files are now sorted according to how Phauto sorts them. Files are now renamed as : 001_a_colon.wav, 002_aΙͺ.wav, 003_aΚ.wav,
4. Cross-Platform Path Handling
folderPath := FileLocator documents / aFolderName.
Problem: Windows/macOS path separators differ, causing issues with folders of numbers and
phonemes stored in Documents
(e.g., C:\Users<your-name>\Documents\numbers
).
Solution: Implemented FileLocator
-based path resolution to generate
OS-independent paths.
Impact: Seamless cross-platform deployment; generalized way to configure paths for any OS.
5. Audio Metadata Issue
Problem: Phausto (via libsndfile) cannot properly read audio files containing metadata tags.
Solution: Stripped metadata from all phoneme audio files using online-audio-converter.com. Confirmed playback success after conversion.
Timeline Chart of PAM Development β€οΈ
Community Bonding & SAM Deep Dive
Pharo Foundation & Constants Implementation
Rules Engine & Pattern Matching Logic
Major Bug Fixes & Logger Implementation
First Audio Success with Phausto Integration
Phoneme-to-IPA Conversion & Speaking Words
Prosody & Voice Synthesis Mastery
Metacello Baseline & Project Optimization
Final Integration & GSoC Success
Code Quality & Testing β€οΈ
Comprehensive Test Suite
- 86 unit tests covering all conversion scenarios
- Green test status across full alphabet
Clean Architecture
- Object-oriented design with clear separation of concerns
- Utility classes for logging, path handling, and audio management
Performance Metrics
Metric | Value |
---|---|
Package Memory Consumption | < 1 GB [Package size] |
Rule Database [English letters β Graphene] | 402 rules across 26 letters |
Rule Database [SAM Phoneme (Grapheme) β IPA phoneme] | 47 rules |
Audio Sample Count | 44 IPA phonemes + 10 numbers |
Package Load Time | <1 Minute |
Contribution Summary
- Total Commits: 109
- Lines of Code: 5053 (as of writing this document)
- Project Status: Production ready, with Metacello script, full documentation and test coverage
Future Development Opportunities π
Enhanced Prosody: Implement formant synthesis for more natural speech patterns.
Voice Customization: Additional gender voice presets.
Real-time Processing: Streaming audio generation for long texts.
Acknowledgements β€οΈ
Special thanks to my mentors Domenico Cipriani and Nahuel Palumbo for all their guidance. Starting from Pharo architecture, DSP integration, software engineering best practices to Debugging sessions we've had. PAM wouldn't have reached this stage if it wasn't for their guidance. Their expertise in both linguistic processing and audio synthesis was instrumental in achieving the project goals.
Noteworthy links
1. The project builds upon the foundational work of the original SAM system and leverages the modern capabilities of Phausto for audio generation.
2. Understanding IPA phoenems : WalkOnCross Github. Dataset of 44 Phoneme taken from here: Github
3. Remove Metadata from audio files : online-audio-converter
Useful Links β€οΈ
1. PAM-GSoC-25-Project repository
2. GSoC'25 Weekly Updates Readme !
3. My Proposal for GSoC'25 -> PAM : Pharo's TTS model
5. My tutorial to learn Pharo as a beginner