Saturday, July 31, 2010

Video Games & Online Games

This post in an attempt to disambiguate the domain of virtual humans. Most people have never heard the term “Virtual Human” before, but they all play games (online or offline) and they all have interacted with some limited form of a VH on the web.

Computer games (online and offline) are the closest thing to the domain of Virtual humans.

Online games (e-gaming)

You can argue that online games are much simpler than video games, but they are progressively getting much more complicated. As in video games, fully-fleshed avatars are widely used to immerse the player into the scenario. Below is an example of a poker game I found from a company called PKR. Notice the use of body language, face expressions, etc to create a fully realistic poker simulation.  

Video Games:

Below is a screenshot from my favourite game Mass Effect:

ME2%20choices Source: http://www.jpbrown.co.uk/reviews.html 

Notice the use of dialogue wheels to simulate dialogues between the avatars. There is an excellent analysis of the particular style of conversation here

However, in contrast to most current video games Virtual humans engage players in actual dialogue, using speech recognition, dialogue system technology, and emotional modelling to deepen the experience and make it more entertaining. Such technologies have started only recently to find their way into video games. Tom Clansy’s End War game is using speech recognition to allow users to give commands to their armies.

endwar-beta-02 Source: http://www.the-chiz.com/

Some games go as far as using full natural language processing:

Virtual Humans on the Web:

There are a lot of very superficial virtual humans on the web. This is perhaps one of the main reasons that they have failed so far to become a mainstream interface. What virtual humans should be, is about the whole thing: emotion modelling, cognition speech, dialogue, domain strategies and knowledge, gestures etc. Avatars like Anna of IKEA are mere drawings with very limited dialogue abilities, and are simply there to create a more interesting FAQ (Frequently Asked Question) Section. There is still someway to go before we will see full-scaled avatars on the web, but we will get there.

 

2  Source: http://www.ikeafans.com/forums/swaps-exchanges/1178-malm-bed-help.html

 

Friday, July 30, 2010

E-Learning Prototype

Below is the prototype of a e-learning system that I was asked to do by a company. As I can not draw, I decided to use Microsoft Word to communicate my ideas. There should be a good storyboarding tool out there that could help me to streamline the process.

The design below is based on existing and proven technologies that can be easily integrated into existing e-learning platforms. Codebaby, a company in Canada is already using avatars (such as those shown in my design) [1] in e-learning very successfully for several years. The picture in the last screen of the design is a virtual classroom [2] created in the popular Second Life platform.

 
Compare my solution with a “conventional” e-learning platform shown below. Although I do include several GUI (Graphical User Interface) elements in my work, it is obvious that : a) my interface is minimalistic with fewer elements on the screen. b) accessibility is greater, as instead of clicking on multiple links in order to accomplish tasks, you can simply “ask” the system using the most natural method you know - “natural language”. The benefits of avatar-assisted e-learning will become evident when the web progresses from its current form to Web 2.0 and ultimately to Web 3.0. For now, such solutions should at least be offered as an augmentation to “conventional” GUI-based interfaces. All companies want something more, for example something that would add easier access to module contents and  the “WOW” factor to their products. They just don’t know what it is until you show it to them.
 
1[2]
 
Although the proposed design is based on mature and well-tested technologies, I can understand if someone wants the purely GUI solutions. In fact, I would be more than happy to assist them. I have been working with GUI interfaces for several years, long before I developed an interest for avatar technologies. I developed my first e-learning tool back in 1998 (12 years ago). It was an educational CD-ROM about the robotic telescope platform of Bradford University.
 

[1] http://www.codebaby.com/showcase/

[2] http://horizonproject.wikispaces.com/Virtual+Worlds

Heuristics vs. User Research

People keep asking me about the W3C accessibility guidelines – a set of heuristics that should aid designers towards more accessible web sites. Of course these are not the only guidelines out there, BBC has it own accessibility guidelines and  there are several for web usability as well. Although I am familiar with the W3C guidelines, I didn’t use them in my MGUIDE work because I didn’t find them relevant. The reason is that the W3C guidelines are written specifically for web content and not for multimodal content. The research is the area of virtual humans provide more relevant heuristics, but there is still room for massive additions and improvements. Instead of heuristic evaluation, I decided to built my own theoretical framework to guide my research efforts. The framework is based on the relevant literature in the area and on well documented theories of human cognition. It provides all the necessary tools for iterative user testing and design refinement. 

There is no doubt that relying on user testing is costly and lengthy. This becomes even more difficult, when you have to deal with large groups of people as I did in MGUIDE. However the cost and time can be minimised with the use of proper tools. For example, the global lab project has created a virtual lab (on the popular Second Life platform) in which projects are accessible to anybody, anytime, and from anywhere. New research methods like eye tracking and emotion recognition, can reveal user insights with a relatively small group of people and with minimal effort. Soon enough perhaps, tools will include routines that calculate deep statistics with minimal intervention. User testing has definitely some way to go before it becomes mainstream, but I am sure we will get there.

Until then, Inspection methods (e.g., cognitive walkthroughs, expert surveys of the design, heuristic evaluations etc) are used to replace user testing. In such a process, some high level requirements are usually prototyped, and then judged by the expert against some established guidelines. A major problem with this approach though, is that there are over 1,000 documented design guidelines [1]. How do you choose which is one is proper given the specific context? It is my understanding that each institute/professional uses a set of best-practice guidelines, adapted from the relevant literature and from years of experience. However, even if these guidelines have worked in the past it doesn’t mean they will work again. Technology is progressing extremely fast, and people become more knowledgeable, and more accustomed to technology every single second. Therefore, even when inspection methods are used some form of user testing is necessary. A focus group for example, with a couple of users can provide enough user-insights to amend a design as necessary.

[1]http://www.nngroup.com/events/tutorials/usability.html 

 

 

Wednesday, July 28, 2010

Emotion Recognition for Accessibility Research

There are a number of quantitative techniques that can be used in the user research of avatar-based interfaces. Apart from the “usual” techniques for gathering subjective impressions (through questionnaires, tests, etc) and performance data, I also considered a more objective technique based on emotion recognition. In particular, I thought of evaluating the accessibility of the content presented by my systems through the use of emotion expression recognition. The main hypothesis is that the perceived accessibility of the systems' content is evident in the user's emotional expressions. 

If you think about it for a while, the human face is the strongest indicator of our cognitive state and hence, how we perceive a stimuli (information content, image, etc). Emotion measures (both quantitative and qualitative) can provide data that can augment any traditional technique for accessibility evaluation (e.g., questionnaires, retention tests, etc). For example, with careful logging you can see which part of your content is more confusing, which part requires the users to think more intensively, etc. In addition to the qualitative data, the numeric intensities can be used for some very interesting statistical comparisons.  Manual coding of the video streams is no longer necessary, as there are a number of tools that allow automating analysis of face expressions. To my knowledge the following tools are currently fully functional:

1) Visual Recognition

ScreenShot (1)

2)   SHORE

Mimikanalyse

The idea is fully developed, and I am planning the release of the paper very soon. Finally, If we combine this technique with eye-tracking we can reveal even more user-insights about avatar-based interfaces. We could try for instance to identify, what aspect of the interface make the user to have the particular face expression (positive or negative). For example, one of the participants in my experiments mentioned that she couldn’t pay attention to the information provided by the system, because she was looking at the guide’s hair waving. To such a stimuli humans usually have a calm expression. This comment is just an indication of the user-insights that can be revealed, if these techniques are successfully combined.

Tuesday, July 27, 2010

Accessibility/Universal Access

I recently found a good resource [1] on accessibility from a company called Cimex that says what most designers and UX specialists fail to see – when you design for accessibility you do not cater only for less able users. You are making sure that your content is open and accessible to a variety of people and machines using whatever browser or method they choose.

Now, caring for a variety of people of different physical, cognitive emotional and language background and the methods they choose to use you end up with Universal Access.   

Using traditional interfaces is difficult to achieve the goals of Universal Access. Virtual humans as interfaces hold a high potential of achieving the goals of UA as the modalities (e.g., natural language processing, gestures, facial expressions and others) used in such interfaces are the ones our brains have been trained to understand over thousand of years. Virtual humans can speak several languages with a minimal effort (see the Charamel showroom). Their body and face language can be adjusted easily to highlight the importance of a message. Sign-language can be used to communicate with less-able users (no other interface can currently accomplish that). Accurate simulation of interpersonal scenarios (e.g., a sales scenario) can guarantee that your message gets across as effectively as it would if a real person would speak it.

In my work I did go as far as Universal accessibility by comparing the effects of virtual human interfaces on the cognitive accessibility of information under simulated mobile conditions, using groups of users with different cultural and language background. In order to make the information easier to access, I used a variety of methods found in the VH interfaces (e.g., natural language processing, gestures, interpersonal scenario simulation and others). By making the main functions of your system easier to access you ultimately make the interface easier to use and hence, it was natural to investigate some usability aspects as well (e.g., ease of use, effectiveness, efficiency, user satisfaction, etc). These are all aspects of the user experience (UX), i.e., the quality of experience a person has when interacting with a product. I can not release any more information at this stage, as the necessary publications have not yet been made.

In the future I believe that the existing technologies will merge into two mainstream platforms: a) Robotic assistants from the one and b) software assistants/virtual human interfaces from the other. Accessing the services these systems will offer will be as easy as our daily interactions with other people. The barriers that exist today (cognitive, physical, etc) will become a “thing” of the past.  

Monday, July 26, 2010

MGUIDE Development Process

I thought it would be a good idea to try to explain the methodologies followed in the development of the MGUIDE prototypes. Having a focus mainly on the research outcomes, the development methodology followed was of little concern to the involved stakeholders. Trying to create interpersonal simulations like the ones found in real-life is a process mostly compatible with the a Scrum development methodology (shown below). I am planning to create a paper on the topic, and hence I will not say much in this post.

  800px-Scrum_process_svg

Source: Wikipedia

Gathering the requirements of the users can be done using a variety of ways. I followed a combine literature-user evaluation approach. One of my earliest prototypes was developed using guidelines found in the literature. The prototype was then evaluated with actual users and a set of new requirements was developed. These requirements are what the SCRUM refers to as the “product backlog”. Each spring (usually in my case 1-3 months) a set of the requirements were developed and tested, and then were replaced by a new set of requirements. Doing simulations of interpersonal scenarios gives you the freedom to augment the product backlog with new requirements quite easily. Using methods of research like direct observation and note taking, you can take notes on the interactions found in the scenarios that you want to simulate. My scenario was a guide agent and hence, I went to a number of tours where I made a number of interesting observations. Most of my findings were actually developed in the MGUIDE prototypes, but there are others that still remain in the “product backlog”. Of course these requirements and the work that was done in the MGUIDE is enough to inform Artificial intelligence models of behaviour in order to create completely automated systems.

This iterative process was then repeated prior the actual user research stage, where the full set-up of the MGUIDE evaluation stage was tested. I used a small group of people that tried to find bugs in the software, problems with the data gathering tools and others. The problems were normally corrected on site and the process was repeated again. Once I ensured that all my instruments were free of problems, the official evaluation stage of the prototypes started.

Closing this post, I must highlight the need for future research in gathering data about different situations where interpersonal scenarios occur. In reality different situations produce different reactions in people and this should be researched further. Only through detailed empirical experimentation we can ensure that future avatar-based systems will guarantee superior user experiences.

 

Friday, July 23, 2010

Speech Recognition

In order to successfully simulate an interpersonal scenario with a virtual human, you need speech recognition (in real-life we speak to each other and not click on buttons or use text). For this reason, I have been following closely the evaluation of the speech recognition industry for some time now. 

During the MGUIDE project I successfully integrated speech recognition into one of my prototypes. I used Microsoft Speech Recognition Engine 6.1 (SAPI 5.1) with dictation grammars which I developed using the Chant GrammarKit, in pure XML. The grammars look like this:

<RULE name="Q1" TOPLEVEL="ACTIVE">
<l>
<P><RULEREF NAME="want_phrases"/>to begin</P>
<P><RULEREF NAME="want_phrases"/>to start</P>
<P><RULEREF NAME="want_phrases"/>to start immediately</P>
</l>
<opt>the tour </opt>
<opt>the tour ?then</opt>
</RULE>

I also voice-enabled the control of the interface of my system, so if you would say “Pause” the virtual guide would pause its presentation. I briefly tested both modes with one participant in the lab. In the dictation mode, with just a couple of minutes of training Microsoft’s engine performed with 100% accuracy within the constrains of the grammar. For completely unknown input, the engine performed with less than 40% accuracy. In CnC mode, the engine worked with 100% accuracy without any training. Of course,  SAPI 5.4 in Windows 7 offer much better recognition rates in both dictation and CnC modes. I haven’t tried SAPI 5.4 but is within my plans for the future.I think that true speaker-independent (i.e., without training) recognition in indoor environments, is only 5 years away, at least for the English language.

In mobile environments, Siri appears to be the only solution out there that realises the idea of a virtual assistant on the go using speech recognition. Siri works uses dynamic grammar recognition, similar to my approach. If you say something within the constrains of the grammar the accuracy of recognition reaches 100%. However, as in the case of my prototype, if you say something outside the grammar files the recognition results can be really funny.

Statement to Siri: Tell my husband I’ll be late

Reply: Tell my Husband Ovulate (he he he)

ASR

Source: http://siri.com/v2/assets/Web3.0Jan2010.pdf

Terminology:

Dictation Speech Recognition: Refers to the type of speech recognition where the computer tries to translate what you say into text

Command and Control mode (CnC): This type of speech recognition is used to control applications

MGUIDE Project Funding

As people keep asking me about the funding of the MGUIDE project, I thought to post this in an attempt to clarify the situation further. The MGUIDE was a large and very sophisticated project and money come from a variety of resources. The project started in 2007 and until 2008, Middlesex University was the main funding body and my last official employment institute. The package from Middlesex University covered my project expenses for that year and required me to perform maximum 15 hours of teaching/week. Two other Universities and six companies also provided support in the form of know-how, and funding for tools and hardware. From 2008 until June 2010, I was able to secure funding from an angel investor and thankfully the continued support of the companies and universities. The idea with the MGUIDE was and still remains to develop a commercial product out of it. However, because of the bad economic climate, my investor decided not to proceed any further. I still hope that this work will appeal to a company and I will be able to see MGUIDE as an application/system for Ipad or any other tablet-based computer system.

Wednesday, July 21, 2010

Project Management - E-Learning Projects

This post is not related to the MGUIDE project, but to the work I did at Middlesex University. Most of the modules I taught there were project based. Usually I had to guide several groups of students (20-30 students) into the design and development of projects.  A particular project had to do with the design and implementation of  e-learning games for autistic children. Each of my students was given a case study describing the requirements of the particular autistic children (as these were captured from the teachers of the children, like for instance, that the children needs help in understanding emotion expressions), and had to produce a game under my guidance. The game was hen sent to the relevant school for full-scale evaluation. Each semester I usually ended up marking 100-200 games with at least 80% of them being top class. Below is an example of the projects produced under my guidance. All material is copyrighted by Middlesex University so please ask before you copy anything:

All multimedia elements (including designs) have been produced by my students using Adobe Photoshop. The tools needed for the game development along with best practice techniques, were discussed in detail in the class by myself.

Copyright by Middlesex University – Please do not copy

Each game was evaluated in the class (by myself and the students). The games were then sent to the schools for formal evaluation by the children and their teachers. Below is a sample of the heuristic evaluation that was performed in the class:

Evaluation comments

Negative Comments

Positive Comments

Background

First scene using a suitable background and clearly stating your own title for the topic and clear instructions

   

Text

A variety and clear use of Text (Spelling and Grammar?) Too small – Too large – not clear- inappropriate words used

   

Monday, July 19, 2010

Art Assets

Below are samples from the art work I completed in the MGUIDE project. Although I have several years of experience in designing using Adobe Photoshop, I do not consider myself a designer. Design is interesting, but I prefer programming and user-based research. However, if a project requires me to produce art assets I am perfectly capable of accomplishing that as well.

 

Friday, July 2, 2010

Experiment 4 setup

Due to demand, I decided to provide some information on the set-up of my experimental work during the evaluation stage of MGUIDE. The information below is the briefing participants had to read for experiment 4. Please note, that the main technique for data collection in experiment 4, is think aloud protocol. I conducted the testing with two user groups of 6 participants.

The purpose of this experiment is to investigate the possible effects of two mobile systems for path finding of variable complexity, on your ability to find your way in the castle. You will have to use the systems to navigate along two different routes visiting a number of landmarks in turn (10 to maximum 18), using the system A on one, and the system B on the other. The total duration of the experiment doesn’t exceed 20 minutes per route. For the purpose of the experiment I have created two video applications representing in detail each route of the castle. At each video-clip you will hear the question “What would you do at the particular point, if you were in the castle?” You must answer the question based on the visual (i.e., gestures and landmarks) and/or audio instructions delivered by the system.

For example:

Given this instruction:From where you are, if you look on your right, you will see two chimneys. Opposite the chimneys there is a path that leads to another square. Please follow this path until you will come across a house with a black front-door!


And this clip:
clip_image002 

You will have to answer: “I will follow the path on the right of the tree until I see the house with the black front-door”. The next video will show the result of this action (i.e., that you have moved towards the house with the black door), and will pose you a new navigational challenge.