When you say the quality is bad, what exactly do you hear?
We've written a G.729 RTP <-> APS converter, but the sound is muffled and low volume.
For example, we're receiving a 20-byte RTP payload. This means it holds 2 G.729 frames, so we start with the first 10 bytes (let's say we put them in 8-bit descriptors b[0]...[9]), and apply the following conversion (according to the APS documentation and RFC 3551):
Code:
TBuf16<12> nokiaBuf;
nokiaBuf.FillZ(12);
nokiaBuf[0] = 1; // Full rate
nokiaBuf[1] = (TUint16) b[0];
nokiaBuf[2] = (((TUint16)(b[1] & 0xff)) << 2) | (((TUint16)(b[2] & 0xc0)) >> 6);
nokiaBuf[3] = (((TUint16)(b[2] & 0x3f)) << 2) | (((TUint16)(b[3] & 0xc0)) >> 6);
nokiaBuf[4] = (((TUint16)(b[3] & 0x20)) >> 5);
nokiaBuf[5] = (((TUint16)(b[3] & 0x1f)) << 8) | (TUint16) b[4];
nokiaBuf[6] = (((TUint16)(b[5] & 0xf0)) >> 4);
nokiaBuf[7] = (((TUint16)(b[5] & 0x0f)) << 3) | (((TUint16)(b[6] & 0xe0)) >> 5);
nokiaBuf[8] = (TUint16)(b[6] & 0x1f);
nokiaBuf[9] = (((TUint16)(b[7] & 0x1f)) << 5) | (((TUint16)(b[8] & 0xf8)) >> 3);
nokiaBuf[10]= (((TUint16)(b[8] & 0x07)) << 1) | (((TUint16)(b[9] & 0x80)) >> 7);
nokiaBuf[11]= (TUint16)(b[9] & 0x7f);
Now nokiaBuf[0]...[11] holds the 24-byte frame we can send to APS. After that we repeat it with the remaining 10 bytes of the RTP payload.
If there's any 2-byte frames left, we interpret them as a SID frame and use the following conversion:
Code:
TBuf16<5> nokiaBuf;
nokiaBuf.FillZ(5);
nokiaBuf[0] = 2; // SID rate
nokiaBuf[1] = (((TUint16)(b[0] & 0x80)) >> 7);
nokiaBuf[2] = (((TUint16)(b[0] & 0x7c)) >> 2);
nokiaBuf[3] = (((TUint16)(b[0] & 0x03)) << 2) | (((TUint16)(b[1] & 0xc0)) >> 6);
nokiaBuf[4] = (((TUint16)(b[1] & 0x3e)) >> 1);
However, as said, this leads to a muffled sound. What are we doing wrong?