Stegafoto: a lens which embeds audio and text inside images
Stegafoto is a Windows Phone Lens which enables the user to embed a piece of audio or text within the image. This article explains the theory used to embed the content (virtually "loss free") along with technical detail about the implementation.
Article Metadata
Code Example
Tested with
Compatibility
Article
Contents |
Introduction
Embedding text or audio within an image can make it easier for the photographer to vividly re-live the experience when browsing an image months after it has been taken. The technique used here is "fat-free" (does not increase the size of the image) and does not visibly distort or affect the quality of the image. Briefly put, the technique uses the principle of Steganography with a simple "even-odd" encoding scheme in the least significant bits of the pixels in the image.
The article explains how this result has been achieved at two levels. The first part is structured so that even someone with no programming experience should be able to get a feel for how it works - all you need is an open mind. The "Technical Details" parts that follow assume that the reader is familiar with C# and Javascript programming.
The video below shows this process:
Walkthrough
In order to simplify the explanation, we first discuss how to embed text in the image (embedding audio is very similar and we discuss briefly below).
The case of embedding text into an image is demonstrated in the following steps:
Also note, the method described below is one of the many ways of performing this task. While writing the application I used a test-driven approach coupled with rapid prototyping and this is why I ended up with this series of steps. I did not bother with re-factoring the algorithm in order to optimize the solution (e.g. implement a fault tolerant scheme or a picture upload functionality) as I am not interested in writing a product but, merely keen in proving the concept. Also, I used a 3rd party library Imagetools for Silverlight as the PNG encoder for saving the captured image stream to a file. I chose to use PNG instead of JPG as the former guarantees that the pixels changes made by the algorithm would be preserved in the final image.
Embedding the data
In order to explain the concept, let's use a real-life analogy: Let's say we would like to mix together two substances such as, sugar and water. One would go about by adding the two in a bowl and apply a process of stirring in order to do the mixing. In terms of an end result, if you were to inspect the bowl, it would look like the water has remained intact but, the sugar has vanished. The reason behind why the sugar "disappears" from the bowl is that there is a physical decomposition that splits the sugar particles to microscopic size so that at macroscopic level, they cannot be distinguished any more. It is important to realize here that neither has the sugar escaped the bowl nor, has it been transformed into something else. It is simply present in a different composition.
With this in mind, let's examine how we mix text (e.g. sugar) and image (e.g. water) together so that the text dissolves into the image. As in the above analogy, our "stirring" method reduces the characters of the text into a microscopic form so that they are indistinguishable among the pixels in the image. In computing terms, the atomic form to which we reduce the text is known as bit representation. As you may know, a bit is either number 1 (quantity of high value) or, number 0 (quantity of low value). So, generally speaking we convert the text and image into bits and mix the whole. You might wonder at this stage that if they are all bits, how can we later separate the character bits from the pixels bits in order to recover the embedded content. The answer to this question is that we need a systematic method (hence this article) of doing the "mixing" so that, at any point in time we can locate the position of character bits in the ocean of bits. By recognizing the positions, we can later simply access that bit and re-construct the embedded data. Note that unlike the above analogy where the volume of the mixture increases, in our case it does not. This is simply because we are replacing the bits and therefore not adding anything extra to the image. In order to understand this data embedding (mixing) process, we need to get some basics hammered down:
The anatomy of a piece of text:
In computing terms, a text is a sequence of characters known as a string. Example, the string Hi is the letter H followed by the letter i. At the bit (atomic) level, all characters (to simplify) are represented by a sequence of 8 bits known as a byte. For instance, in the system (i.e. on the device), H is represented by the sequence 01001000 while i is represented by 01101001. See the technical detail sub-section for how to convert a string to its bit representation.
The anatomy of an image (pixel):
An image is made up of picture elements known as pixels. A pixel represent color information at a particular location in the image. So, when you are looking at an image on a computer screen, the image is simply a grid of colors. According to the ARGB color model, color is defined by 4 properties or, channels: Alpha which indicates the amount the transparency, Red which refers to the amount of red color, Blue which represents the amount of blue color and finally, Green which deals which green color. In combination, these 4 properties describe a particular color as well as how see-through it would be. In terms of implementing this model in images, a pixel uses a sequence of 4 bytes (or, 4 x 8 = 32 bits) to represent the color information at a given location. In other terms a byte for each channel or property. The figure below illustrates this explanation.
As it can be observed, the geography of an image can be viewed as a xy-coordinate system with the first pixel occupying the coordinate (1,1). Also, one can observe that the pixel is decomposed into 4 channels according to the ARGB color model. Finally, each channel is a byte long and is expressed as a sequence of 8 bits.
The "mixing" process:
If breaking the data into its bits representation is one aspect of the fusing (or, mixing process), then the other part constitute placing the bits of the text in pre-defined positions in the pixels so that they can be recovered later. We will refer to this process from now on as, encoding. So, the encoding method we are adopting is technically known as altering the least-significant bits of the channels of the pixels. In essence by doing so we are introducing the least amount of perturbation in the image.
In order to simplify the explanation, we are going to examine how the letter H is going to be dissolved into the image. As you may recall from the above explanation, letter H has been decomposed to the sequence 01101001. The algorithm behind Stegafoto employs all 4 channels for encoding and this means that for each character (which is 8 bits long) we are going to use (8 / 4 = ) 2 pixels. Let's further assume that we will start the encoding starting with the first pixel (i.e. pixel at coordinate (1,1)). This means that we are going to need another pixel as well. For the simplicity and convenience, we are going to use the adjacent pixel. More explicitly, the pixel at coordinate (2, 1) instead of pixel (1, 2). In our schema below which illustrates the idea, I am referring to P1 and P2.
As it can be observed , if we panned out all the channels of the pixels and examine their least significant bits, we are effectively looking at a data space where we can store information. In the before part, if we were to decode that sequence of 8 least-significant bits we would get a character other than H. However, if we were to do the same with the representation in the after part, we would indeed get H. Note also the bits and bytes affected after the encoding process.
As you can recognize in the above illustration, if we consider the sequence of all least-significant bits in pixels P1 and P2, it is exactly the sequence of the binary representation of letter H. So, this is how we cleverly place the bits of the text so that we can easily recover them later. And at the same time make the least amount of disturbance in the color-space of the image. Without getting into much detail, we use the odd-even nature of the value of the channel in order to concretely make changes to the least-significant bits. If you want to know more see the technical details accompanying this section.
Recording audio (as in the above video) is the same process except that we have to convert the captured sound from the microphone into a binary sequence. This is done in the following manner:
- Get the recorded sound bite as PCM data and apply the relevant header in order to convert it to WAV. The algorithm for converting PCM to WAV can be found here.
- Next take WAV data as a sequence of bytes and convert it to Base64 encoding in order to get a string representation.
- Take the string and pass it to the convert-to-binary method mentioned above.
Technical details
In this sub-section I am going to give code snippets and explanations (in the comments) on how the above algorithm has been implemented. So,
Converting a String to an array of bits (boolean):
private bool[] ConvertStringToBitArray(String str)
{
bool[] bitArray = new bool[8 * str.Length];
int j = 0;
foreach (char c in str)
{
for (int i = 0; i < 8; i++)
{
bitArray[j + i] = (((c >> (7 - i)) & 0x00000001) == 1 ? true : false);
}
j += 8;
}
return bitArray;
}
Encoding a bit in a pixel:
/**
* Method below is called from the following context:
*
* for (int i = 0; i < embeddedDataAsBitArray.Length; i++)
* {
* // pngImage.Pixels[i] is referring to a channel in the pixel. E.g. when i%4 == 1 we are accessing the Red channel of the pixel.
* pngImage.Pixels[i] = Encode(embeddedDataAsBitArray[i], pngImage.Pixels[i]);
* }
*/
private byte Encode(bool bit, byte val)
{
if (val % 2 == 1)
{
if (bit == false) // => byte is odd and we would like to write a 0
{
val--;
}
}
else
{
if (bit == true) // => byte is even and we would like to write a 1
{
val++;
}
}
return val;
}
Converting a PCM to a WAV according to the algorithm found here:
Encoding ENCODING = System.Text.Encoding.UTF8;
// User has pressed the 'Record Audio' button:
private void RecordAudio(object sender, GestureEventArgs e)
{
e.Handled = true;
Debug.WriteLine("Recording Audio ...");
if (_mic.State == MicrophoneState.Stopped)
{
Debug.WriteLine("Audio Sample Rate: {0}", _mic.SampleRate);
_audioStream.SetLength(0);
// Write a header to the stream so that we can have a WAV file:
// This document was used to create header: https://ccrma.stanford.edu/courses/422/projects/WaveFormat/
_audioStream.Write(ENCODING.GetBytes("RIFF"), 0, 4);
// This will be filled later once the recording is done. I.e. we would know the size of data.
_audioStream.Write(BitConverter.GetBytes(0), 0, 4);
// WAVE is made up of 2 parts: Format (fmt ) which describes the audio data such as, channels,
// bitrate, etc and then (data) which is the actual audio data.
_audioStream.Write(ENCODING.GetBytes("WAVE"), 0, 4);
// Writing the Format part:
_audioStream.Write(ENCODING.GetBytes("fmt "), 0, 4);
// This indicates the size of the 1st part that will follow this segment. 16 implies that audio is in PCM format.
_audioStream.Write(BitConverter.GetBytes(16), 0, 4);
_audioStream.Write(BitConverter.GetBytes((short)1), 0, 2);
_audioStream.Write(BitConverter.GetBytes((short)1), 0, 2);
_audioStream.Write(BitConverter.GetBytes(_mic.SampleRate), 0, 4);
_audioStream.Write(BitConverter.GetBytes(_mic.SampleRate * BYTES_PER_SAMPLE), 0, 4);
_audioStream.Write(BitConverter.GetBytes((short)BYTES_PER_SAMPLE), 0, 2);
_audioStream.Write(BitConverter.GetBytes((short)BITS_PER_SAMPLE), 0, 2);
// Writing the Data part:
_audioStream.Write(ENCODING.GetBytes("data"), 0, 4);
// The size of the data will be known once the recording is done.
_audioStream.Write(BitConverter.GetBytes(0), 0, 4);
_mic.Start();
StopRecordingButton.Visibility = Visibility.Visible;
RecordAudioButton.Visibility = Visibility.Collapsed;
}
}
// User has pressed on the 'Stop Audio Recording' button:
private void StopRecording(object sender, GestureEventArgs e)
{
e.Handled = true;
Debug.WriteLine("Stop recording Audio ...");
if (_mic.State == MicrophoneState.Started)
{
_mic.Stop();
_audioStream.Flush();
long endOfStream = _audioStream.Position;
int streamLength = (int)_audioStream.Length;
_audioStream.Seek(4, SeekOrigin.Begin); // Move the 'cursor' to the 1st place holder in the header of the WAVE format.
_audioStream.Write(BitConverter.GetBytes(streamLength - 8), 0, 4); // Insert the size of the stream - the WAVE header part.
_audioStream.Seek(40, SeekOrigin.Begin); // Move the 'cursor' to the 2nd place holder which is 36 bits away.
_audioStream.Write(BitConverter.GetBytes(streamLength - 44), 0, 4);
_audioStream.Seek(endOfStream, SeekOrigin.Begin);
Debug.WriteLine("Recorded {0}s of audio", _mic.GetSampleDuration(streamLength));
RecordAudioButton.Visibility = Visibility.Visible;
StopRecordingButton.Visibility = Visibility.Collapsed;
// Converting WAV into String format:
_capturedHiddenData = System.Convert.ToBase64String(_audioStream.ToArray());
if (!String.IsNullOrEmpty(_capturedHiddenData))
{
_capturedType = AUDIO_TYPE;
ShareImageButton.IsEnabled = true;
}
else
{
_capturedType = UNDEFINED_TYPE;
ShareImageButton.IsEnabled = false;
}
}
}
Extracting embedded data
In this section I am going to discuss the method of retrieving the embedded data from the image. For this purpose, I imagined that a probable context where the user would do that would be while browsing the photos hosted on a web-service. In that sense, the viewer application would be a web-browser. Having said that, the photos could be hosted locally (as in my examples above) and hence the web-browser would thus be the ideal tool for all situations. In an earlier experiment, I implemented the "decoding" with the basic <canvas> element but it was not so successful. It turns out that the get-pixels function (getImageData) of the <Canvas> object returns premultiplied alpha pixels which for us means that the embedded data is destroyed. Some vendors provide flags to turn off the premultiplier property but, this means that we have to have almost bespoke solutions for each browsers and more than often turning off the flags does not even help. So, then I decided to take the WebGL route and this was much simpler (I tested solution on Firefox and Chrome and the script worked flawlessly without any alterations.). Note that, Internet Explorer does not support WebGL but, an equivalent solution can be cooked with Silverlight 5. You can verify if your browser supports WebGL by visiting the can I use website.
The algorithm for this part of the solution is equally simple:
- First prepare your canvas so that it can leverage the WebGL APIs.
- Get a reference to the resource (i.e. URI to the image) and render it as a texture on the canvas.
- Then, read the pixels of the texture (not off the canvas via the getImageData() method) and create a binary sequence out of the ARGB channels of the pixels based on the even-or-odd nature of those values.
- Finally, convert the sequence of bits into a String and process according to the type of data we are dealing with. If the type of data is:
- pure text then display it somewhere (e.g. step 5 in the Introduction of this article).
- is audio, prepend "data:audio/wav;base64," to the data and set that string as source to the <Audio> element. As the WAV data was encoded in Base64 by the Lens and media elements in HTML support data-urls, subsequently when the user presses on the play button, the sound comes through.
Technical details
Because this solution permits the user to embed either text or audio, the system must be able to distinguish one from the other in order to deliver the appropriate experience to the user. This issue is tackled by simply hardening the format in which embedded data is encoded. For the sake of the demo a very simple scheme by prepending a header to the data before it is encoded. The header has the following structure: ST#<type_of_data>#<length_of_actual_data_in_bytes>#. For example, in the case described in the Introduction of the article, the encoded data is: ST#T#2#Hi. With this in mind, let's have a look at some code snippets to see how the above algorithm has been implemented. So,
Preparing <Canvas> to use WebGL:
var _canvas = null;
var _gl = null;
var _shaderProgram = null;
// Creates the shader based on the ID of the shader description found in the DOM:
function GetShader(id) {
var shader;
var shaderScriptNode = document.getElementById(id);
if (!shaderScriptNode) {
throw "Could not find a shader script descriptor with ID [" + id + "]";
}
// Walk down the node and construct the shader script:
var script = "";
var currChild = shaderScriptNode.firstChild;
while(currChild) {
if(currChild.nodeType == currChild.TEXT_NODE) {
script += currChild.textContent;
}
currChild = currChild.nextSibling;
}
// Identify the type of shader (Vertex or Fragment):
if (shaderScriptNode.type == "x-shader/x-vertex") {
shader = _gl.createShader(_gl.VERTEX_SHADER);
} else if (shaderScriptNode.type == "x-shader/x-fragment") {
shader = _gl.createShader(_gl.FRAGMENT_SHADER);
} else {
throw "Could not find a valid shader-type descriptor";
}
// Load the script into the shader object and compile:
_gl.shaderSource(shader, script);
_gl.compileShader(shader);
if (!_gl.getShaderParameter(shader, _gl.COMPILE_STATUS)) {
throw "Compilation error in script [" + id + "]: " + _gl.getShaderInfoLog(shader);
}
return shader;
}
function CreateShaderProgram(vsId, fsId) {
var vs = GetShader(vsId);
var fs = GetShader(fsId);
var shaderProgram = _gl.createProgram();
_gl.attachShader(shaderProgram, vs);
_gl.attachShader(shaderProgram, fs);
_gl.linkProgram(shaderProgram);
if(!_gl.getProgramParameter(shaderProgram, _gl.LINK_STATUS)) {
throw "Unable to create shader program with provided shaders."
}
_gl.useProgram(shaderProgram);
return shaderProgram;
}
function InitWebGL(canvasId, VertexShaderScriptId, FragmentShaderScriptId) {
_canvas = document.getElementById(canvasId);
if (!_canvas) {
throw "Could not locate a canvas element with id '" + canvasId + "'";
} else {
try {
_gl = _canvas.getContext("webgl") || _canvas.getContext("experimental-webgl");
console.log("Created WebGL context ...");
_gl.pixelStorei(_gl.UNPACK_PREMULTIPLY_ALPHA_WEBGL, false);
_gl.pixelStorei(_gl.UNPACK_COLORSPACE_CONVERSION_WEBGL, false);
_shaderProgram = CreateShaderProgram(VertexShaderScriptId, FragmentShaderScriptId);
console.log("Created Shader Program ...");
} catch (e) {
_gl = null;
throw "Err: WebGl not supported by this browser.";
}
}
}
// Initializing the canvas called 'WorkingArea':
function Initialize() {
try {
InitWebGL("WorkingArea", "ImgVertexShader", "ImgPixelShader");
// Set the canvas dimensions in the Shader Program (Vertex Shader):
_gl.uniform2f(_gl.getUniformLocation(_shaderProgram, "uCanvasRes"), _canvas.width, _canvas.height);
// Create a buffer for the Texture Coordinate:
_gl.bindBuffer(_gl.ARRAY_BUFFER, _gl.createBuffer());
_gl.bufferData(_gl.ARRAY_BUFFER, new Float32Array([0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0]), _gl.STATIC_DRAW);
var texCoordLocation = _gl.getAttribLocation(_shaderProgram, "aTextureCoord");
_gl.enableVertexAttribArray(texCoordLocation);
_gl.vertexAttribPointer(texCoordLocation, 2, _gl.FLOAT, false, 0, 0);
// Create a texture inorder to load image into it later:
_gl.bindTexture(_gl.TEXTURE_2D, _gl.createTexture());
_gl.texParameteri(_gl.TEXTURE_2D, _gl.TEXTURE_WRAP_S, _gl.CLAMP_TO_EDGE);
_gl.texParameteri(_gl.TEXTURE_2D, _gl.TEXTURE_WRAP_T, _gl.CLAMP_TO_EDGE);
_gl.texParameteri(_gl.TEXTURE_2D, _gl.TEXTURE_MIN_FILTER, _gl.NEAREST);
_gl.texParameteri(_gl.TEXTURE_2D, _gl.TEXTURE_MAG_FILTER, _gl.NEAREST);
// Create a buffer for the rectangle that will "host" the texture:
_gl.bindBuffer(_gl.ARRAY_BUFFER, _gl.createBuffer());
var positionLocation = _gl.getAttribLocation(_shaderProgram, "aVertexPosition");
_gl.enableVertexAttribArray(positionLocation);
_gl.vertexAttribPointer(positionLocation, 2, _gl.FLOAT, false, 0, 0);
} catch (e) {
alert(e);
}
}
Defining the shader objects with HLSL so that the image can be rendered correctly as a texture:
<script id="ImgPixelShader" type="x-shader/x-fragment">
precision mediump float;
uniform sampler2D uImage;
varying vec2 vTextureCoord;
void main() {
gl_FragColor = texture2D(uImage, vTextureCoord);
}
</script>
<script id="ImgVertexShader" type="x-shader/x-vertex">
attribute vec2 aVertexPosition;
attribute vec2 aTextureCoord;
uniform vec2 uCanvasRes;
varying vec2 vTextureCoord;
void main() {
// The coordinate system is a different geometry to how we usually treat images. In an image the "origin" of the coordinate
// system is on the top-left corner. In this system, the origin is at the 'center' with a [-1, 1] range. Hence, we must perform
// the following transformation below in order to calibrate things. The end result is a coordinate system called the Clip-Space
// coordinate system with the origin at the bottom-left.
vec2 inCSCoordPos = ((aVertexPosition/uCanvasRes) * 2.0) - 1.0;
gl_Position = vec4(inCSCoordPos * vec2(1, -1), 0, 1);
vTextureCoord = aTextureCoord;
}
</script>
Extractring the embedded data from the loaded texture:
function ConvertBitArrayToString(bitArr) {
var str = "";
for (var i=0; i<bitArr.length; i+=8) {
var val = 0;
for (var j=0, shiftCtr=7; j<8; j++, shiftCtr--) {
val += (bitArr[i+j] << shiftCtr);
}
str += String.fromCharCode(val);
}
return str;
}
function ReadLine(lineNumber) {
var bitArray = new Array();
var pixelArrayInRGBA = new Uint8Array(4 * _canvas.width);
_gl.readPixels(0, lineNumber, _canvas.width, 1, _gl.RGBA, _gl.UNSIGNED_BYTE, pixelArrayInRGBA);
for(var i = 0; i < pixelArrayInRGBA.length; i++) {
bitArray[i] = pixelArrayInRGBA[i] % 2;
}
return ConvertBitArrayToString(bitArray);
}
function DecodeAsStegaFotoImage() {
console.log("Decoding as a StegaFoto ...");
// Because the origin is at the bottom-left in clipspace coordinate, this means that 1st pixel
// row of the image is actually at the very bottom of the canvas:
var lineCounter = _canvas.height - 1;
var line = ReadLine(lineCounter);
var indexOfDelimeter = line.indexOf(_MESSAGE_DELIM);
if (indexOfDelimeter > 0) {
var arrParts = line.substring(0, indexOfDelimeter).split(":");
var lengthOfData = parseInt(arrParts[2], 10);
var typeOfEmbeddedData = arrParts[1];
var data = "";
var fromIdx = indexOfDelimeter + _MESSAGE_DELIM.length;
var numberOfLinesToRead = ((lengthOfData * 2) + fromIdx) % _canvas.width;
if(numberOfLinesToRead > 0) {
for(var i=1; i <= numberOfLinesToRead; i++) {
line += ReadLine(lineCounter - i);
}
}
data = line.slice(fromIdx, fromIdx + lengthOfData);
// 'T' means that data must be interpreted as pure text. While 'A' implies that data must be treated as a the data part of data-url:
if (typeOfEmbeddedData == "T") {
document.getElementById("EmbeddedMessage").value = data;
} else if (typeOfEmbeddedData == "A") {
document.getElementById("AudioPlayer").src = "data:audio/wav;base64," + data;
}
}
}
Downloads
The source code can be downloaded from here: Media:Stegafoto SRC.zip
Conclusion
I hope that the way I have structured the article makes the approach clear for both programmers and non-programmers. Hopefully, the videos and images (schemas) support your understanding on how the Stegafoto app works.







Contents
Hamishwillee - Subedited/Reviewed
Hi vnuckcha
Thank you for this extremely fun article- - the concept is new to me. I think the way you structured it is quite good - I barely glanced at the code and have a fair idea how it works.
I have given this a very minor review for wiki style (could do a bit more, but holding off for now):
Two "general" suggestions
In terms of the article I think it is both interesting and useful (the technique was new to me). Two things are slightly incorrect
Thoughts?
Regards
Hamishhamishwillee 08:01, 31 January 2013 (EET)
Vnuckcha - In reply to the above comment
Hi Hamish,
Thanks for re-arranging the structure of the document so that it meets the Wiki's styling requirement. Please find below answers to your questions:
Q: could you confirm what device you tested this on? A: I tested the Lens on Lumia 920 and 820 and the WebGL on Firefox 18 and Chrome 24
Q: Would it be possible to add a zip containing your WP project with this code? A: Yes, i have updated the source code for both the Lens (Windows Phone Project) and the Viewer (Javascript + HTML). Remember that the code is fragile (e.g. do not record long sentences as it would crash the app) and is provided as is.
Q: Can you please update your profile (vnuckcha ) to say a little about you and them make it public? A: I have made my profile public but do not know what to say about myself that is of interest.
Some responses to some of the above comments: C: "Once this case is understood, the audio part will be explained on top of that " - it isn't explained, though I think you could do this in a few seconds R: I actually explain it at the bottom of that section. Ref:
C: In the image with before and after the numbers in the boxes are the same R: You are correct here. I updated the illustration accordingly. Thanks !
C: That confuses me a bit, p1 appears to have changed just the most significant byte while p2 appears to have changed the least significant. R: You are confused because you are assuming that there is a most and least significant bye. Allow me to explain:
Thanks again for the comments and corrections :)
Vikvnuckcha 09:49, 31 January 2013 (EET)
Vnuckcha - Comment output is not good
Hi Hamish,
I am unable to edit my comments above so that they are readable. It would be great if you could correct the formatting exception that is creating the above mess (horizontal scrollbar).
Thanks in advance.vnuckcha 10:09, 31 January 2013 (EET)
Hamishwillee - Further subedit - needs a new review
Hi Vik
Thanks very much!
I now understand this better, but I thing the "simple" explanation is too complicated. The main problem was that we talk about 1s and 0s and odds and evens and switching values.
What I think is actually happening is that you're just replacing the Least Significant bit of each channel with your data to be encoded (ie not "flipping") (of course by replacing the value might not actually change). Is that correct? If not, then I have to say, I'm still confused.
Its also not clear to me how you know you've reached the end of your encoded content (ie "hi"?) or that there is encoded content in a particular image?
I have reworded this below (ie in this comment). Can you check that it looks OK. This will need a little tidying but I think it is a better way of explaining what is going on. Even if I'm wrong in my explanation, I think this "structure" is better and should be adopted for the section rather than algorithm steps.
I haven't bothered to change your comments yet because I can read them, and I'll delete them when we're finished in total. If you want to do this you press the admin link
Note, I also refer to the process as embedding now.
Regards H
Embedding the data
There are two parts to embedding the data:
Converting data to a binary sequence
Converting text data into a binary sequence is easy - we simply use some unique sequence of '0' and '1' to represent each character, and then string the sequences together. For example, using the ASCII encoding 'H' is assigned to 72 decimal (which is 01001000 as a base 2 number stored as 8-bits in a computer) and 'i' is assigned to 105 in decimal (01101001 in binary); "Hi" can therefore be represented as the binary sequence 0100100001101001.
Recording audio (as in the above video) is the same process, except that we have to convert the captured sound from the microphone into a binary sequence. This is done in the following manner:
Encoding data into the image
Encoding data into the image is only a little more complicated.
An image is constructed from a huge number of coloured dots or "pixels". Using the ARGB colour model, each pixel is made up of four bytes (a byte is 8 bits), which contain the values the Alpha (opacity/transparency), Red, Green and Blue "channels" respectively (these channels define the final appearance of the pixel.)
As each of the channels has 8 bits it can represent a value from 0 to 255 in decimal (00000000 to 11111111 in binary). The right-most "bit" is called the least significant bit; if this bit is changed, the decimal value of the channel value will change by one (for example 11111111 = 255 to 11111110 = 254).
To encode our data we'll we'll set the least significant bit of each of the channels with the data we want to encode. This may not change the value of a particular bit (if it is already the same as the data bit), but even if it does such a change will be virtually undetectable to the human eye.
As each pixel has four channels it can store four bits of data. This means that we'll need 2 pixels to store the 8 bits for every character, and 4 pixels for our "Hi" string (16 bits).
<image here maybe>
If we use a 640 x 480 image resolution, then we can store (640 x 480 / 2 =) 153600 bytes of data (characters). This corresponds to about 7 seconds of audio based on approach used in the previous section.
hamishwillee 07:09, 1 February 2013 (EET)
Vnuckcha -
Hi Hamish,
I rewrote the encoding section completely by avoiding to explain certain things. I think that it abstracts things even more (e.g. no talk of HEX and ASCII). I also abstracted the explanation about the odd-even business and replaced the illustrations with another one. All in all, i hope that the article is more comprehensive while retaining its essence.
Hear from you.
Vikvnuckcha 13:58, 5 February 2013 (EET)
Pooja 1650 - Nice article!
Hello Vnuckcha,
Your article is very interesting. The way you described the things is also good.
Keep it up!
Thanks,
Poojapooja_1650 14:53, 5 February 2013 (EET)
Vnuckcha -
Thanks Pooja!
Kind Regards,
Vikvnuckcha 14:02, 7 February 2013 (EET)
Hamishwillee - This is much better
Hi Vik
Yes, this is pretty good - much better than it was. Thank you.
There is probably a bit more that could be done to subedit - will try find time later today (but if not it is perfectly acceptable).
Regards
Hamsihhamishwillee 07:33, 11 February 2013 (EET)
Aakash95 - excellent wiki
excellent wiki here everyone, thank you!Aakash95 04:33, 20 February 2013 (EET)
Aady - Very Nice Article
Loved the concept & article !!!! Good one :)
Regards,
AadyAady 12:57, 22 February 2013 (EET)
Yan -
Hi. Fun article. I thinks, you should explain why you png instate of JPG.
I'm not sure. For me is only a monitor display problem. If imagetools change your RGB value, your process will not work.yan_ 17:52, 22 February 2013 (EET)
Vnuckcha - Thanks
Thanks Aakash, Aady and Yan for your kind words. I am glad you found the article interesting and fun :)
@Yan - I did not want to explain why i chose PNG to JPG in the main article as it adds to the length of the article and also it is a side issue to the main purpose of the article. So, I used PNG instead of JPG as it guaranteed that the pixels changes i made would be preserved. JPG has a convoluted algorithm where the pixels values can change very slightly and that is enough to destroy the data in the prototype i was making. Of course one could device a more resilient algorithm for storing data and i left that as an exercise to the reader.
Thanks again to all three of you for your kind words.vnuckcha 08:08, 25 February 2013 (EET)
Yan -
Hi. I know the diference between png and jpeg. But it's not he same things for all Reader.
I thinks tour explanation is good and could be added in your article ;)yan_ 08:45, 25 February 2013 (EET)
Vnuckcha - Changes made :)
Hi Yan,
I added that explanation to the article. Thanks for the feedback.
Cheersvnuckcha 09:23, 25 February 2013 (EET)