<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5882627736725710289</id><updated>2012-02-04T22:04:25.872-05:00</updated><category term='apache'/><category term='mediawiki'/><category term='LAMP'/><category term='javascript'/><category term='documentation'/><category term='mysql'/><category term='open science'/><category term='php'/><category term='ajax'/><category term='tag extension'/><category term='macros'/><category term='OWW'/><category term='hadley centre'/><category term='climate change'/><category term='API'/><category term='relatedness'/><category term='design concept'/><category term='electronic lab notebook'/><category term='trac'/><category term='python'/><category term='software engineering'/><category term='myelink'/><category term='screenshot'/><category term='plugins'/><category term='prototype'/><category term='google'/><title type='text'>maria yancheva</title><subtitle type='html'>software engineering for climate change scientists: notes from an undergraduate summer research project</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>22</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-3329057737850006709</id><published>2009-08-21T00:23:00.008-04:00</published><updated>2009-08-21T00:55:37.713-04:00</updated><title type='text'>Demo server + screencast + poster</title><content type='html'>Demo:&lt;br /&gt;&lt;a href="http://www.cs.toronto.edu:40154/mediawiki"&gt;http://www.cs.toronto.edu:40154/mediawiki&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Screencast:&lt;br /&gt;&lt;a href="http://www.cs.toronto.edu/%7Eyancheva/myelink/myelink-screencast.htm"&gt;http://www.cs.toronto.edu/~yancheva/myelink/myelink-screencast.htm&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Poster:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.cs.toronto.edu/%7Eyancheva/myelink/myelink-poster.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 400px; height: 300px;" src="http://3.bp.blogspot.com/__iI3woR69XA/So4iLFqe3FI/AAAAAAAAAB8/7b7zF4Yn1dE/s400/CS-poster-raster.jpg" alt="" id="BLOGGER_PHOTO_ID_5372268979393846354" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-3329057737850006709?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/3329057737850006709/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/08/demo-server-screencast-poster.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/3329057737850006709'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/3329057737850006709'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/08/demo-server-screencast-poster.html' title='Demo server + screencast + poster'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/__iI3woR69XA/So4iLFqe3FI/AAAAAAAAAB8/7b7zF4Yn1dE/s72-c/CS-poster-raster.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-105832586142354501</id><published>2009-08-12T10:03:00.003-04:00</published><updated>2010-01-31T13:07:06.037-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediawiki'/><category scheme='http://www.blogger.com/atom/ns#' term='tag extension'/><category scheme='http://www.blogger.com/atom/ns#' term='relatedness'/><title type='text'>Mediawiki: Importing large batches of pages from Wikipedia</title><content type='html'>How to import/export to Mediawiki&lt;br /&gt;&lt;br /&gt;After a few days of trying out various methods of importing (a large number of) pages into my local Mediawiki (including writing my own import script, before I realized Mediawiki had export/import capabilities), I've come up with a procedure that works. Below are steps + solutions to common problems that occur when importing.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Go to the Special:Export page of the source wiki, e.g.: &lt;a href="http://en.wikipedia.org/wiki/Special:Export"&gt;http://en.wikipedia.org/wiki/Special:Export&lt;/a&gt;, and select a bunch of articles you want to export. DO NOT select "Include Templates" and click Export. If you select "include templates", the xml file will simply include the template syntax in the pages which use templates, but _not_ the actual source templates. The result would be a lot of ugly syntax in the pages you import making you wonder whether the import worked. (By the way it's interesting that most of the pages in wikipedia seem to be using templates extensively - I realized this when I imported using the "include templates" option).&lt;/li&gt;&lt;li&gt;Now, go to the Special:Import page of the target wiki, choose the XML file you just downloaded, press Import.&lt;/li&gt;&lt;/ol&gt;Most likely, you will get one or more of these errors. Here are solutions to the ones I encountered:&lt;br /&gt;&lt;ul type="square"&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Session timeout&lt;/span&gt; - the file is too big. Try to import smaller batches. (Similarly if you get a maximum file upload size type error).&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:100%;"&gt;&lt;span class="mw-headline"&gt;&lt;span style="font-weight: bold;"&gt;Fatal error: Allowed memory size of nnnnnnn bytes exhausted (tried to allocate nnnnnnnn bytes)&lt;/span&gt;  - this is a PHP error - &lt;a href="http://www.mediawiki.org/wiki/Manual:Errors_and_Symptoms#Fatal_error:_Allowed_memory_size_of_nnnnnnn_bytes_exhausted_.28tried_to_allocate_nnnnnnnn_bytes.29"&gt;here's how to resolve it&lt;/a&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Error in fetchObject()": Illegal mix of collations for operation ' IN ' (localhost)&lt;/span&gt; - this is a mediawiki problem, &lt;a href="http://www.mediawiki.org/w/index.php?title=Project:Support_desk/Sections/Uploading&amp;amp;oldid=269900#Special:Import_error"&gt;luckily there's a quick fix here&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;For completeness, &lt;a href="http://www.cs.toronto.edu/%7Eyancheva/mediawiki-import.py"&gt;here's the Python import script I wrote&lt;/a&gt;. It seems to work but no guarantees. It's always better to try the mediawiki import first.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-105832586142354501?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/105832586142354501/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/08/mediawiki-importing-large-batches-of.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/105832586142354501'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/105832586142354501'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/08/mediawiki-importing-large-batches-of.html' title='Mediawiki: Importing large batches of pages from Wikipedia'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-6218504387287838592</id><published>2009-07-15T14:43:00.001-04:00</published><updated>2009-07-15T14:46:45.539-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediawiki'/><category scheme='http://www.blogger.com/atom/ns#' term='tag extension'/><title type='text'>Input validation, search sort and my word cloud algorithm</title><content type='html'>I've added client-side user input validation and made the search result headings as links, so that when clicked the results are re-sorted based on that criterion. For example, clicking on "Title" sorts the results in alphabetical order, and if clicked again toggles ascending/descending sort. I'd like to optimize how this happens though, because every time the results are re-sorted they are also re-calculated which is not needed. One idea is to 'index' all the wiki pages before the extension runs, so that tags are generated for each page and stored in a database, which can then be searched much faster. That also means that the script which indexes the wiki pages should be run periodically, as content is not static.&lt;br /&gt;&lt;br /&gt;I've also spent some time playing around with creating word clouds via PHP's GD library for image creation and my implementation of &lt;a href="http://chrisdone.com/wordcloudalgo/"&gt;this algorithm&lt;/a&gt;, but I realized that algorithm is too slow to be useful. The reason I'm writing my own implementation is that I previously used the Google WordCloud API but the results it produces are not exactly what I wanted. The google "word clouds" are simply lines of text (plain text, not an image), with a font resize for words which occur more frequently. I wanted something more like &lt;a href="http://www.wordle.net/"&gt;wordle.net&lt;/a&gt; (but unfortunately that's closed-source). While I couldn't find an open-source project which does the same thing (maybe I didn't search very extensively), today I wrote another implementation using the GD image library, and this time I created my own algorithm. Here's what it looks like with some random words:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/__iI3woR69XA/Sl4cs9cl3BI/AAAAAAAAABM/M6TR6Yd7hnU/s1600-h/myWordle-final.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 300px; height: 160px;" src="http://1.bp.blogspot.com/__iI3woR69XA/Sl4cs9cl3BI/AAAAAAAAABM/M6TR6Yd7hnU/s400/myWordle-final.png" alt="" id="BLOGGER_PHOTO_ID_5358752165351775250" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;And here's how it works using PHP and GD:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Get an array of (word =&gt; frequency) pairs.&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Randomize the array to produce different-looking word clouds on refresh. (And keep track of the relation between each word and its frequency).&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Initialize the image object, available colours, image background and font path via GD.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Slice the image into imaginary lines. The line height is the maximum font size used (which I calculate by getting the highest word frequency and multiplying it by a resize factor).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/__iI3woR69XA/Sl4f4Zo0PPI/AAAAAAAAABU/5xvkUXi5pZA/s1600-h/myWordle0.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 300px; height: 150px;" src="http://2.bp.blogspot.com/__iI3woR69XA/Sl4f4Zo0PPI/AAAAAAAAABU/5xvkUXi5pZA/s400/myWordle0.png" alt="" id="BLOGGER_PHOTO_ID_5358755660432686322" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Keep track of current line number, starting at 0. Place the first word on the current line (line 0). I vary the x-coordinate randomly between 1 and 8 pixels from the left image border (so that consecutive lines don't appear to line up). The y-coordinate is equal to the word height.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/__iI3woR69XA/Sl4giD-XegI/AAAAAAAAABc/iEm5asWi2jM/s1600-h/myWordle1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 300px; height: 150px;" src="http://4.bp.blogspot.com/__iI3woR69XA/Sl4giD-XegI/AAAAAAAAABc/iEm5asWi2jM/s400/myWordle1.png" alt="" id="BLOGGER_PHOTO_ID_5358756376172001794" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Continue placing words on the current line, until the border is reached. Notice that the words stick to the top of the line, and they vary in size (proportional to their frequency), so it doesn't look like they're lined up. I also vary the colour randomly.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/__iI3woR69XA/Sl4hirhYjcI/AAAAAAAAABk/35OlBNea6fQ/s1600-h/myWordle2.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 300px; height: 150px;" src="http://3.bp.blogspot.com/__iI3woR69XA/Sl4hirhYjcI/AAAAAAAAABk/35OlBNea6fQ/s400/myWordle2.png" alt="" id="BLOGGER_PHOTO_ID_5358757486299483586" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Once the right border is reached, increment the variable storing the current line number, and repeat the process until all words are displayed.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/__iI3woR69XA/Sl4iWUgwGgI/AAAAAAAAABs/qZlIxRm6ksM/s1600-h/myWordle3.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 300px; height: 150px;" src="http://2.bp.blogspot.com/__iI3woR69XA/Sl4iWUgwGgI/AAAAAAAAABs/qZlIxRm6ksM/s400/myWordle3.png" alt="" id="BLOGGER_PHOTO_ID_5358758373476014594" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Finally, don't display the lines.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/__iI3woR69XA/Sl4inwpxM1I/AAAAAAAAAB0/5IdeyO3GGCo/s1600-h/myWordle-final.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 300px; height: 160px;" src="http://1.bp.blogspot.com/__iI3woR69XA/Sl4inwpxM1I/AAAAAAAAAB0/5IdeyO3GGCo/s400/myWordle-final.png" alt="" id="BLOGGER_PHOTO_ID_5358758673087804242" border="0" /&gt;&lt;/a&gt;Things that need to be improved: the sparsity of the word cloud is due to the fact that the code which determines how far "left" to place a word along a line is a bit of a hack - because I'm not sure how to calculate the varying font width.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-6218504387287838592?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/6218504387287838592/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/07/input-validation-search-sort-and-my.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/6218504387287838592'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/6218504387287838592'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/07/input-validation-search-sort-and-my.html' title='Input validation, search sort and my word cloud algorithm'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/__iI3woR69XA/Sl4cs9cl3BI/AAAAAAAAABM/M6TR6Yd7hnU/s72-c/myWordle-final.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-3341647018938760245</id><published>2009-07-03T12:29:00.004-04:00</published><updated>2009-07-03T13:02:28.049-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediawiki'/><category scheme='http://www.blogger.com/atom/ns#' term='tag extension'/><category scheme='http://www.blogger.com/atom/ns#' term='documentation'/><category scheme='http://www.blogger.com/atom/ns#' term='ajax'/><category scheme='http://www.blogger.com/atom/ns#' term='google'/><title type='text'>Adding features</title><content type='html'>Yesterday I finished implementing Ajax using Mediawiki's interface for Ajax calls. I'm happy to say it's working quite nicely now. I also rearranged some of the menus/links for (hopefully) easier navigation.&lt;br /&gt;&lt;br /&gt;I also looked into Google's Word Cloud API and added word clouds of the articles' content as another easy way to give the user visual representation of the similarities between articles. Google's documentation is fantastic, so this only took about 5 min to get it working :) (By the way, to avoid being dependent on Google's server availability, I added the API JS scripts and stylesheets to the server locally, so no connection to google takes place when the user decides to display a word cloud).&lt;br /&gt;&lt;br /&gt;Next, &lt;a href="http://lucene.apache.org/"&gt;Lucene&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;After that, there are several things I need to do before the extension will be ready for testing:&lt;br /&gt;&lt;ul type="square"&gt;&lt;li&gt;Add user input validation&lt;/li&gt;&lt;li&gt;Write the project documentation&lt;/li&gt;&lt;li&gt;Tidy up the source code and comment it&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Fix any bugs/issues that come up during testing&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-3341647018938760245?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/3341647018938760245/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/07/adding-features.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/3341647018938760245'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/3341647018938760245'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/07/adding-features.html' title='Adding features'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-2931720524182376974</id><published>2009-07-02T10:02:00.001-04:00</published><updated>2009-07-02T10:02:17.605-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediawiki'/><category scheme='http://www.blogger.com/atom/ns#' term='tag extension'/><category scheme='http://www.blogger.com/atom/ns#' term='myelink'/><category scheme='http://www.blogger.com/atom/ns#' term='ajax'/><title type='text'>AJAX and MW</title><content type='html'>I spent the last few days refactoring code and incorporating an XMLHttpRequest object to make the rendering of user-specified search options asynchronous. Since changing the options has to re-generate the search results, and generating the results was being handled by the main extension function, I had to either pass selected bits of the results to the external script called by XMLHttpRequest which would then render the search options (for example, weighting of textual and structural search), or move all the processing of results in the external script. I decided to use the second approach because that would make for more readable code and would be a better design decision in the long run. Of course, that led to another problem, as the Mediawiki API is not available for use by the external script (and I was using the API to make calls to the MW database in order to fetch article content for making comparisons). After a few days of searching for a solution, it turns out that MW comes with &lt;a href="http://www.mediawiki.org/wiki/Manual:Ajax"&gt;support/interface for AJAX extensions&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;On a different note, I asked some friends for name suggestions, and I got Myelink - (pronounced as my link, it's a play on myelin and link) and here's the explanation I got: "the Myelin Sheath is a layer that increases the speed of neurotransmission by... a lot.. 1000x at times depending on the animal and location. Myelinated neurons are also famous for being in the white matter of brains, which serves as an information pathway for the brain and connects different parts and facilitates communication. Which is what the extensions is set out to do.." I like the connotation and since what I am currently focusing on is generating related pages with the aim to facilitate communication between different contributors of scientific experiments/articles, I think it fits nicely :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-2931720524182376974?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/2931720524182376974/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/07/ajax-and-mw.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/2931720524182376974'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/2931720524182376974'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/07/ajax-and-mw.html' title='AJAX and MW'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-7525824561398103059</id><published>2009-06-18T17:15:00.046-04:00</published><updated>2009-06-18T17:57:33.185-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='javascript'/><category scheme='http://www.blogger.com/atom/ns#' term='mediawiki'/><category scheme='http://www.blogger.com/atom/ns#' term='tag extension'/><category scheme='http://www.blogger.com/atom/ns#' term='ajax'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><title type='text'>Pages' previews and some technical details</title><content type='html'>I finished implementing the 'preview' feature which allows a live embedded view of the pages listed by the extension. It took a while because of several parsing/markup rendering issues I ran into:&lt;br /&gt;&lt;ul type="square"&gt;&lt;li&gt;Dynamically fetching and displaying the articles' content - this was a bit of a problem because the 'fetching' part is done in PHP, whereas the displaying is done using JS - one is server-side and the other client-side, and I had to coordinate the two. I ended up pre-fetching the content of the articles concurrently with generating the results (i.e. as a page is found to be related, its content is obtained), and then passing that content as an argument to a JS function which can display it on the client-side whenever the user clicks 'preview'. I need to look into AJAX in the future.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Displaying the generated page content - first, it had to be parsed from wiki markup to HTML, then escape the HTML special chars and entities and pass it to the JS function, and from there unescape it so it can be displayed. For obtaining HTML from wiki markup I used a parser function:&lt;br /&gt;&lt;pre name="code"&gt;$html = $parser-&gt;parse($content, $pageTitle, new ParserOptions(), true, true)-&gt;getText();&lt;br /&gt;&lt;/pre&gt;Escaping the HTML was the easiest part because of PHP's built-in functions:&lt;pre name="code"&gt;$html = addslashes(htmlspecialchars(htmlentities($html, ENT_QUOTES)));&lt;br /&gt;&lt;/pre&gt;Unescaping the HTML from JavaScript proved a little more difficult. I ended up making use of this great resource: &lt;a href="http://phpjs.org/"&gt;php.js&lt;/a&gt; (a package porting PHP functions to JavaScript), so after including the file ($out-&gt;addScriptFile("php.js"), and editing the setup script to place it in /skins/common), all I had to do was use html_entity_decode and htmlspecialchars_decode.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Editing the displayed content - show only the section headings with a little snippet of text under each heading. Ideally, the snippet would be the part that is similar to the current page. Getting the content of particular sections is not hard to do using MediaWiki's API, but only when you know what the sections are. The problem is figuring out how many sections a page has. Someone on the mediawiki IRC suggested using the _NUMBEROFHEADING_ magic word, but after looking into the documentation that doesn't seem to exist... or for some reason I can't find it. So I just decided to manually parse through the content and count the sections, which is a hack but it works.&lt;/li&gt;&lt;/ul&gt;After figuring out how to do this, it seems pretty straight-forward but figuring it out took a bit of time - mostly because you can never be sure how the mediawiki parser will decide to parse and output the text (wiki markup, html, or rendered html). Next I want to work on improving the accuracy of my actual algorithm (finish implementing all the criteria and test with some real pages), and after that there are several things to work on: Sarah's excellent suggestion during Tuesday's talk with Anita Sarma when she pointed out that the algorithm for determining structural similarity can be used to detect emerging patterns and turn them into templates (thanks Sarah), and Anita's suggestion for a feature that allows cloning of entire pages (thanks Anita). I have an idea of how to do the latter, and I think I'll need some sort of persistent data store to implement the first and intellegently detect recurring structural patterns - that will be a lot more fun to code than the silly details I had to work out today.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-7525824561398103059?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/7525824561398103059/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/pages-previews-and-some-technical.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/7525824561398103059'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/7525824561398103059'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/pages-previews-and-some-technical.html' title='Pages&apos; previews and some technical details'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-5670985213361160265</id><published>2009-06-15T12:03:00.000-04:00</published><updated>2009-06-15T15:47:02.594-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediawiki'/><category scheme='http://www.blogger.com/atom/ns#' term='tag extension'/><category scheme='http://www.blogger.com/atom/ns#' term='OWW'/><category scheme='http://www.blogger.com/atom/ns#' term='relatedness'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><category scheme='http://www.blogger.com/atom/ns#' term='screenshot'/><title type='text'>Updates</title><content type='html'>My basic algorithm for page-relatedness is now working and displaying a list of the top 5 most similar articles, with links to them (I'm currently working on the in-page preview). Currently I've implemented this as a tag extension so that users can choose where in the page to place the feature (if at all). I considered the pros and cons of adding this feature as an automatic script which runs whenever a page is loaded and displays the results at the top of the page - on one hand this is good because the user does not have to type in anything in order to invoke this function, and it automatically shows up on every page, by parsing the page content and comparing it for relatedness with the other pages on the wiki; but on the other hand, this could get annoying if you don't want it to show up all the time, and so I settled for the tag extension. This requires the user to remember the tag name ("&amp;lt;relatedpages&lt;relatedpages&gt;&amp;gt;") and to insert it, but if the feature itself is useful, then this shouldn't be a problem.&lt;br /&gt;&lt;br /&gt;The algorithm I used for determining relatedness is different from the one I was considering to use a few days ago. I decided to customize the search for related pages to the subject of the wikis - since this would be used to compare notes on computational science or biology experiments, I decided to think of ways to make the algorithm work better than a standard content search and comparison algorithm, by customizing it to the subject of the pages being compared. At the same time, 'relatedness' for notes can be characterized not only as similar subject matter but also as similar experiment type (i.e. related page layout and template).  So the customization of the search which I wrote consists of the following: 1) Structural similarity and 2) Textual content similarity. For the first part, structural similarity depends on both the types of markup tags used and how often they're used, but also on the relative position of those tags. To reflect that, I do several comparisons: count of number of occurences of structural markup tags (these include both wiki markup like '==' for titles as well as allowed html tags like '&amp;lt;div&amp;gt;' and '&amp;lt;span&amp;gt;'); comparison of relative position of structural tags; and comparison of actual headings (within the '==' tags). More criteria could be added in the future, to improve the search further. For the second part of the similarity score, I thought about how to compare textual similarity and I came up with these two criteria (although again, more criteria almost certainly need to be added to fine-tune the search): first I compare the tokens of length greater than or equal to 8 characters (after trial-and-error and observing comparison results I settled on 8 characters as the sufficient token length which would ignore meaningless results such as "because","through" etc. and filter the most significant terms, like "nucleotide", "ligation", "restriction" etc.); then I also compare the capitalized tokens which occur on the page - the reason for this: a lot of methods, protocols, terms in biology (and science in general) are usually referred to by their abbreviations, eg:  FRAP, GFP, FACS, DNA, etc. For this reason, it makes sense to pay special attential to those tokens when comparing two experimental pages.&lt;br /&gt;&lt;br /&gt;To summarize the results, I assigned a weight to both scores - the first based on structure and the second based on textual similarity - and displayed both the cumulative and the individual breakdown scores on the wiki page. &lt;a href="http://individual.utoronto.ca/myancheva/relatedpages_screenshot.png"&gt;Here's a screenshot&lt;/a&gt; of what it looks like on the page.&lt;br /&gt;&lt;br /&gt;Note: As I'm yet not completely familiar with the Hadley wiki pages, I focused on testing the comparison algorithm with the biology-related experiments because they are available at OWW and because I have a vague understanding thereof having done some myself :-) To improve the algorithm in the future however, I could add comparisons which are specific to climate change experiments (e.g. comparing the five-letter names of the experiment runs).&lt;br /&gt;&lt;br /&gt;Next: I'll add controls so the user can change the weights assigned to different aspects of the relatedness comparison - whether structural or textual (currently it's 50/50), and can drill down to see the particular criteria used in the comparison and the tokens which were found to be similar (so that he/she can judge whether to trust the comparison). I'll also implement the 'preview' feature for the related pages displayed, so that the user can see a preview of the sections of the page he/she is interested in without navigating away from the current page. The preview will also show the snippets of text which were found to be related to the current page, allowing the user to easily cut and paste information as necessary. To that end, I'm also working on implementing 'copy', 'paste', 'delete' links onto the section headings (where 'edit' is currently located) - I added the links yesterday but I need to figure out how to edit/delete the sections when the user clicks 'delete' or 'paste' (the way it currently works there's no framework for deleting and pasting content into individual sections so the UI has to go through the Edit page, which is what I'm trying to avoid because it's time-consuming. I think I'll have to directly query the db and change the page content from there).&lt;br /&gt;&lt;br /&gt;On a more technical note, here's the mediawiki code I've used so far for implementing different features (this was probably the most time-consuming part, as there are very, very, few examples of using mediawiki Hooks):&lt;br /&gt;&lt;ul type="square"&gt;&lt;li&gt;To create a tag extension, such that it displays content when the user enters the tag name &lt;extensiontag&gt; in the wiki text, I registered a parserhook extension and used the setHook('extensiontag', functionToRender) function; example:&lt;br /&gt;&lt;pre name="code"&gt;&lt;br /&gt;/* Note: common abbreviations added to variable and function names:&lt;br /&gt;* wf = wiki function, wg = wiki globals,&lt;br /&gt;* ef = extension function */&lt;br /&gt;&lt;br /&gt;/* Place this within the extension setup function. */&lt;br /&gt;&lt;br /&gt;global $wgHooks;&lt;br /&gt;&lt;br /&gt;/* Register the tag name &lt;relatedpages&gt;. efExtensionNameRender is the function called&lt;br /&gt;* when the user uses the &lt;relatedpages&gt; tag. */&lt;br /&gt;$wgParser-&gt;setHook( 'relatedpages', 'efExtensionNameRender' );&lt;br /&gt;&lt;/relatedpages&gt;&lt;/relatedpages&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/extensiontag&gt;&lt;/li&gt;&lt;li&gt;To load content to every page when it is viewed, without having the user input or type anything, use one of the &lt;a href="http://www.mediawiki.org/wiki/Manual:MediaWiki_hooks#Hooks_grouped_by_function"&gt;page rendering hooks&lt;/a&gt;. Depending on the exact time when you want the content to be displayed, you could use ParserBeforeStrip, ParserAfterStrip, etc. I found that in order for the html tags (like &amp;lt;a href&amp;gt; which is not one of the &lt;a href="http://meta.wikimedia.org/wiki/Help:HTML_in_wikitext"&gt;recognized tags&lt;/a&gt; when wiki text is parsed), to be rendered as html, I had to use the &lt;a href="http://www.mediawiki.org/wiki/Manual:Hooks/InternalParseBeforeLinks"&gt;InternalParseBeforeLinks&lt;/a&gt; hook.&lt;/li&gt;&lt;li&gt;To add a link on section headers (where 'edit' is located), I used the DoEditSectionLink hook. (The other two similar hooks - EditSectionLink and EditSectionOtherLinks - are deprecated in newer versions. However DoEditSectionLink is available only after version 1.14, and the mediawiki version which can be downloaded from the ubuntu repositories is 1.11, so I had to install the latest version from source to have it work. Though running several different version of mediawiki is quite easy and fast to setup - I set them up to use the same database backend so sharing of page content between them is instantaneous.) When creating a new link I also had to add the tooltip message/hint to the system messages displayed by mediawiki. These are located in mediawiki_dir/languages/messages/MessagesEn.php (or &lt;span style="font-style: italic;"&gt;Messages**.php&lt;/span&gt;). They are changed not by editing that file, but by using the addMessages() function on the $wgMessageCache global object. Example:&lt;br /&gt;&lt;pre name="code"&gt;&lt;br /&gt;/* Place this in the extension setup function */&lt;br /&gt;global $wgHooks;&lt;br /&gt;$wgHooks['DoEditSectionLink'][] = 'efExtensionNameDoEditSectionLink';&lt;br /&gt;/* end of extension setup function */&lt;br /&gt;&lt;br /&gt;function efExtensionNameDoEditSectionLink(($skin, $title, $section, $tooltip, &amp;amp;$result) {&lt;br /&gt; global $wgMessageCache;&lt;br /&gt; $wgMessageCache-&gt;addMessages( array("copysection" =&gt; "copy") );&lt;br /&gt;&lt;br /&gt; // Add tooltips (hint shown on hover over link).&lt;br /&gt; $wgMessageCache-&gt;addMessages( array("copysectionhint" =&gt; "Copy section: $1") );&lt;br /&gt;&lt;br /&gt; // This only opens an edit page...I need to figure out how to modify this to edit&lt;br /&gt; // the page without going through the edit page.&lt;br /&gt; $copyQuery = array('action' =&gt; 'edit', 'section' =&gt; $section);&lt;br /&gt;&lt;br /&gt; $copyAttribs = array();&lt;br /&gt; if ($tooltip) {&lt;br /&gt;    $copyAttribs['title'] = wfMsg('copysectionhint',$tooltip);&lt;br /&gt; }&lt;br /&gt; $options = array('known', 'noclasses');&lt;br /&gt;&lt;br /&gt; // this actually adds the link:&lt;br /&gt; $copy_url = $skin-&gt;link($title, wfMsg('copysection'), $copyAttribs,&lt;br /&gt;                          $copyQuery, $options);&lt;br /&gt; $result .= "&lt;span class="editsection"&gt;[".$copy_url."] &lt;/span&gt;";&lt;br /&gt; return true;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;To add CSS/JS scripts (using the BeforePageDisplay hook); example:&lt;br /&gt;&lt;pre name="code"&gt;&lt;br /&gt;/* Place this in the extension setup function to register the  hook */&lt;br /&gt;global $wgHooks;&lt;br /&gt;$wgHooks['BeforePageDisplay'][] = 'efExtensionNameBeforePageDisplay';&lt;br /&gt;/* end of extension setup function */&lt;br /&gt;&lt;br /&gt;function efExtensionNameBeforePageDisplay($out) {&lt;br /&gt;   // NB: put 'styles.css' in the default styles directory (usually mediawiki_dir/skins/)&lt;br /&gt;   $out-&gt;addStyle("styles.css");&lt;br /&gt;   $out-&gt;addScript(&lt;br /&gt;&lt;&lt;&lt;eod&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;      /* more javascript here */&lt;br /&gt;      &lt;/script&gt;&lt;br /&gt;EOD&lt;br /&gt;   );&lt;br /&gt;  return true;&lt;br /&gt;}&lt;br /&gt;&lt;/eod&gt;&lt;/pre&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a href="http://svn.wikimedia.org/doc/classOutputPage.html"&gt;&lt;/a&gt;&lt;pre name="code&amp;quot;"&gt;&lt;br /&gt;&lt;/pre&gt;Other notes:&lt;br /&gt;&lt;a href="http://meta.wikimedia.org/wiki/Help:System_message"&gt;System messages&lt;/a&gt;, can be accessed from Special:Allmessages. File located in: /languages/messages/MessagesEn.php.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://bugs.php.net/bug.php?id=32671"&gt;PHP bug #32671&lt;/a&gt; (have to convert float keys to string for associative arrays).&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.w3.org/TR/XMLHttpRequest/"&gt;XMLHttpRequest object&lt;/a&gt; - DOM API - to send an HTTP request query string to a python file from a JS script. The python script uses the handler() method which expects a req parameter representing the request.&lt;br /&gt;&lt;br /&gt;&lt;/relatedpages&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-5670985213361160265?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/5670985213361160265/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/updates.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/5670985213361160265'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/5670985213361160265'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/updates.html' title='Updates'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-7734171483022717023</id><published>2009-06-08T16:02:00.003-04:00</published><updated>2009-06-08T16:03:24.302-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediawiki'/><category scheme='http://www.blogger.com/atom/ns#' term='tag extension'/><category scheme='http://www.blogger.com/atom/ns#' term='relatedness'/><title type='text'>Similarity Algorithms: A measure of relatedness</title><content type='html'>As I'm building a tag extension for displaying a list of related wiki pages on mediawiki, I started thinking about how to quantify the relatedness between two pages, and I looked at several algorithms and research papers on the subject of text similarity and comparison algorithms. This is not as trivial as it seems, and there are a number of researchers who are interested in the area, though most applications are related to testing for copied work in student submissions in CS courses, or the other popular application to search engines. As such, the first category of text comparison algorithms I read about focus on comparing code rather than natural language, but there are several key principles which could be adapted to the latter context. The latter (used in search engines) uses a number of strategies such as word-based, keyword-based, &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9367"&gt;n-gram&lt;/a&gt; and &lt;a href="http://lsa.colorado.edu/"&gt;LSA (latent semantic)&lt;/a&gt; analysis. N-gram focuses on categorizing text based - it calculates a text's profile and compares its distance from each category profile. LSA stores word-context data matrices, and calculates similarity based on resulting word and context vectors based on the cosine of their contained angle (as unintuitive as this may sound, the results obtained from this method measure up quite well with human-made comparisons).&lt;br /&gt;&lt;br /&gt;Text is parsed into essential tokens which are then compared. An important point is to choose a suitable minimum length for common substrings - if this value is too small then the reported similarity may be too high and vice versa. Thus, the similarity &lt;span style="font-style: italic;"&gt;S&lt;sub&gt;N&lt;/sub&gt;(A,B)&lt;/span&gt; between two texts A and B can be defined as the percentage of text in A which can be constructed by combining essential tokens from B of length greater than or equal to N. As N is increased, the similarity S decreases, however a distinction between non-related and related texts can be made since S for related texts decreases more slowly than S for non-related ones. (eg: for N=24, S&lt;sub&gt;related&lt;/sub&gt; = 75-80% whereas S&lt;sub&gt;non-related&lt;/sub&gt; = 20-50%). The similarity tester can be made flexible by finding the set of longest common substrings (order-insensitive), unlike the unix diff utility which finds the longest common subsequence (order-sensitive).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Comparing wiki pages - algorithm:&lt;/span&gt;&lt;br /&gt;&lt;ul type="square"&gt;&lt;li&gt;Since processing of page content to determine relatedness may be a slow process, I'll first filter the pages with similar titles (admittedly this method is not perfect, but it works as a prototype). I considered using something like levenshtein() or similar_text() to compare title strings, but these functions compare the char-by-char match which is not too meaningful and fails to notice global similarities in titles. So I've decided to explode() the $pagetitle into an array of title words (say, $titleArray), then go through every other wiki page title, and determine similarity by counting the number of elements of $titleArray contained in a page title, then scale the result by dividing the length of the common tokens by the length of the page title (in chars). Store the results in $simTitles array, with (similarity_score =&gt; page_title) key-value pairs. Once this is accomplished, krsort() the $simTitles array, and proceed with examining the page content of the first 10 (or maybe fewer) results.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;For the current page (A) and for each page in the array returned from step 1 (B), read page_content from the mediawiki db for both A and B and parse into essential tokens based on &lt;a href="http://www.cs.vu.nl/%7Edick/sim.html"&gt;this SIM algorithm&lt;/a&gt;; In addition to comparing substrings, also compare the highest-frequency words of length greater than 5 (or more) chars (This comparison of lengthier terms would be characteristic of the application of the algorithm to scientific texts, as there are usually terms which may appear often even if the substring analysis does not indicate relatedness -- the reason the SIM algorithm worked well is because its application was primarily to C code, where code in copied assignments would indeed share common substrings where the core algorithm of the program is involved. The difference in application therefore warrants a difference in analysis because while computer languages provide a limited set of allowed expressions, natural language allows greater variation even if the subject of the page is closely related).&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Display pages with S &gt;= 80%, in descending order of similarity. (This percentage value was found to provide consistently accurate results according to research). Format output to the wiki page (similar in style to the Recent Changes display on OWW).&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Reading:&lt;/span&gt;&lt;a href="http://www.cs.vu.nl/%7Edick/sim.html"&gt;&lt;br /&gt;&lt;/a&gt;&lt;ul type="square"&gt;&lt;li&gt;&lt;a href="http://www.cs.vu.nl/%7Edick/sim.html"&gt;Dick Grune and Matty Huntjens' research on Detecting copied submissions in computer science workshops at Vrije Universiteit Amsterdam&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.socsci.uci.edu/%7Emdlee/lee_pincombe_welsh_document.pdf"&gt;Lee, Pincombe and Welsh - An Empirical Evaluation of Models of Text Document Similarity&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://lsa.colorado.edu/"&gt;LSA (latent semantic analysis)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://answers.google.com/answers/threadview?id=337832"&gt;Text/HTML Similarity Algorithms&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-7734171483022717023?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/7734171483022717023/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/similarity-algorithms-measure-of.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/7734171483022717023'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/7734171483022717023'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/similarity-algorithms-measure-of.html' title='Similarity Algorithms: A measure of relatedness'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-2427225192733714172</id><published>2009-06-04T15:19:00.003-04:00</published><updated>2009-06-04T15:35:25.369-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediawiki'/><category scheme='http://www.blogger.com/atom/ns#' term='API'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><title type='text'>Lack of documentation</title><content type='html'>Looking through the mediawiki API documentation (where it exists) is frustrating (mostly because it is in the form of a reference, rather than a manual -- it's useful only once you've figured it out on your own) so I've been reading through the source code and figuring out which functions can be used for what. Examples of using the API in actual code are scarce. Here's what I found so far:&lt;br /&gt;&lt;ul type="square"&gt;&lt;li&gt;On one hand, you could access the database (and do other things, like edit pages, authentication, etc) by sending a query string in a URL through an HTTP request to the api.php file (which is one of the main access points). Here's how this method could be implemented:&lt;br /&gt;&lt;pre name="code" class="php"&gt;$api_dir = '__wiki_url__/api.php'; // __wiki_url__ is the wikimedia install dir&lt;br /&gt;$query_options = array('action' =&gt; 'query',&lt;br /&gt;     'titles' =&gt; 'Main Page',&lt;br /&gt;     'prop' =&gt; 'info',&lt;br /&gt;     'list' =&gt; 'allpages',&lt;br /&gt;     'meta' =&gt; 'userinfo',&lt;br /&gt;     'format' =&gt; 'xmlfm');&lt;br /&gt;$query_string = http_build_query($query_options);&lt;br /&gt;$query_url = $api_dir . '?' . $query_string;&lt;br /&gt;$results = get_url($query_url); // get_url is a function I wrote which uses cURL&lt;br /&gt;&lt;/pre&gt;What this does: returns information about Main_Page (a page on the wiki), along with a list of all pages on the wiki, and metadata about the current user; the output is in the XMLfm format (fm being the pretty-HTML indent format), and it looks like this:&lt;br /&gt;&lt;pre name="code"&gt;&amp;lt;api&amp;gt;&lt;br /&gt;&amp;lt;query&amp;gt;&lt;br /&gt;  &amp;lt;pages&amp;gt;&lt;br /&gt;    &amp;lt;page pageid="1" ns="0" title="Main Page" touched="2009-06-01T18:16:21Z" lastrevid="14" counter="15" length="537"&amp;gt;&lt;br /&gt;    &amp;lt;/page&amp;gt;&lt;br /&gt;  &amp;lt;allpages&amp;gt;&lt;br /&gt;    &amp;lt;p pageid="1" ns="0" title="Main Page"&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;    &amp;lt;p pageid="2" ns="0" title="My test extensions"&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;    &amp;lt;p pageid="3" ns="0" title="SomeNewPage"&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;    &amp;lt;userinfo name="127.0.0.1" anon=""&amp;gt;&lt;br /&gt;    &amp;lt;/userinfo&amp;gt;&lt;br /&gt;  &amp;lt;/allpages&amp;gt;&lt;br /&gt;  &amp;lt;/pages&amp;gt;&lt;br /&gt;&amp;lt;/query&amp;gt;&lt;br /&gt;&amp;lt;/api&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;You could also specify 'format' to be 'php', and then use php's unserialize() function to access the data. More about the query options can be found here: &lt;a href="http://www.mediawiki.org/wiki/API:Query"&gt;MediaWiki API Query&lt;/a&gt;. While this method works, it requires quite a bit of code just to set up the query url, options etc, and it seems there must exist a higher level of abstraction, so I continued looking for another method of accessing the API - see below.&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;The second method: instantiating the classes defined in the source code docs. (So because there is no manual), read the source code here: &lt;a href="http://svn.wikimedia.org/doc/files.html"&gt;http://svn.wikimedia.org/doc/files.html&lt;/a&gt; (particularly, I found useful reading index.php, Article.php, api.php, User.php, Database.php, PageHistory.php; it's also useful to look at the Classes listing separately). Then implement them in your code, like such:&lt;br /&gt;&lt;pre name="code"&gt;// Accessing the database to get a list of all the user names.&lt;br /&gt;$db_read = wfGetDB(DB_SLAVE); // DB_SLAVE for read access, DB_MASTER for write access&lt;br /&gt;$res = $db_read-&gt;query("SELECT user_name FROM user");&lt;br /&gt;$string = "";&lt;br /&gt;while ( $row = $db_read-&gt;fetchObject($res) ) {&lt;br /&gt;$string .= "User: ".$row-&gt;user_name . ";; ";&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;This prints out something like: &lt;pre name="code"&gt;User: WikiAdmin;;&lt;/pre&gt;For a quick intro to querying the database, and a few examples, see &lt;a href="http://www.librarywebchic.net/wordpress/2006/05/08/extending-mediawiki/"&gt;this blog post&lt;/a&gt; (note that the examples from that blog use the built-in insert(), update() etc functions &lt;strike&gt;whereas I find it easier to use the query() function and pass it an actual SQL query, as in the snippet above&lt;/strike&gt;. It's better to use those wrapper functions because they make the code more flexible and independent of the underlying database backend. That said, using query() is much simpler to use because you don't have to worry about new syntax.); there's also a hard-to-find mediawiki page on &lt;a href="http://www.mediawiki.org/wiki/Manual:Database_access"&gt;Database Access&lt;/a&gt;. And here's the &lt;a href="http://upload.wikimedia.org/wikipedia/commons/4/41/Mediawiki-database-schema.png"&gt;database schema&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;The most useful site so far has been &lt;a href="http://organicdesign.co.nz/MediaWiki_code_snippets"&gt;&lt;span style="text-decoration: underline;"&gt;MediaWiki Code Snippets&lt;/span&gt;&lt;/a&gt; which gives many examples of using the API (something that's completely lacking on all of the mediawiki Manual pages).&lt;br /&gt;&lt;br /&gt;And something else: querying the database is not necessary when the information you want is available through the global variables: &lt;a href="http://www.mediawiki.org/wiki/Manual:Wg_variable"&gt;wg variables&lt;/a&gt; (also a complete listing of &lt;a href="http://www.mediawiki.org/wiki/Manual:Configuration_settings"&gt;configuration variables here&lt;/a&gt;) and &lt;a href="http://www.mediawiki.org/wiki/Manual:Global_object_variables"&gt;global object variables&lt;/a&gt;. It took a while to find a mention to this on the mediawiki pages, but these links provide more information. Typically, there are a number of wg (wiki global) variables defined in DefaultSettings.php and LocalSettings.php. Recently however, the concept of Global Object Variables has become more used (for good reason) because it allows instantiations of objects which could be used with various methods without changing the global wg variables.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-2427225192733714172?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/2427225192733714172/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/lack-of-documentation.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/2427225192733714172'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/2427225192733714172'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/lack-of-documentation.html' title='Lack of documentation'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-6209386826089300496</id><published>2009-06-03T14:15:00.001-04:00</published><updated>2009-06-03T14:15:58.294-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediawiki'/><category scheme='http://www.blogger.com/atom/ns#' term='tag extension'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><title type='text'>Making progress</title><content type='html'>Today I continued work on the code for the mediawiki extension - I wrote the python setup script which builds the extension directory by copying the correct scripts and modules to the correct subdirectories and applies the user config settings specific to the local mediawiki installation. The build script works nicely. What became a minor problem was the way the HTML is rendered in the wiki page. Here's what happens: I build my extension, add it to the mediawiki extensions, and use the &amp;lt;extensiontag&amp;gt; within a wiki page; the result: the parser seems to apply pre-formatting to the HTML returned -- the formatted (human-readable) HTML and JS code in my python script are displayed in a similar formatted fashion when rendered (eg: having two links on new lines in the HTML code will render the links on new lines in the wiki page itself, even if there's no explicit new line tag). This is a tad annoying; seems related to &lt;a href="https://bugzilla.wikimedia.org/show_bug.cgi?id=8997"&gt;Mediawiki Bug #8997 on Bugzilla&lt;/a&gt; (which was mentioned in the statsjam source code...I'll take a look at the work-around used there). On a different note, it's quite easy to add dynamic content by embedding some JS into the HTML template returned by the CGI script (which makes me think I should look into visualization APIs for creating a nicer UI). The first feature I've started working on is displaying a list of related users/related wiki pages. For that, I need to figure out how to query the mediawiki database for obtaining data about the collection of wiki pages, and to come up with criteria for what defines 'relatedness' between two users.&lt;br /&gt;&lt;br /&gt;I'm also making a list of use cases and sketching little hand-drawn snapshots of what the interface would look like (more on that tomorrow when I go over it with Steve).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-6209386826089300496?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/6209386826089300496/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/making-progress.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/6209386826089300496'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/6209386826089300496'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/making-progress.html' title='Making progress'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-5195228639803542476</id><published>2009-06-02T20:26:00.039-04:00</published><updated>2009-06-03T01:24:22.014-04:00</updated><title type='text'>Interesting collaboration tool</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/__iI3woR69XA/SiX94RS0ukI/AAAAAAAAAAc/DPeEPwtN0lg/s1600-h/google_wave_logo.png"&gt;&lt;img style="border:0; margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 126px; height: 126px;" src="http://1.bp.blogspot.com/__iI3woR69XA/SiX94RS0ukI/AAAAAAAAAAc/DPeEPwtN0lg/s320/google_wave_logo.png" alt="" id="BLOGGER_PHOTO_ID_5342955676101818946" border="0" /&gt;&lt;/a&gt;While taking a break from writing PHP today, I came across the &lt;a href="http://wave.google.com/"&gt;Google Wave Developer Preview&lt;/a&gt; from Google I/O 2009. Google Wave is an interesting new tool for integrating email, chat, blogging, wikis and other forms of web communication into a seamless live conversation stream. Features include threading of the message waves (can't really call them email threads, as they can be emails, blog comments, 'wiki'-style notes - all handled as 'waves' sent to the list of recipients), live view of new messages as they are being typed (thus practically a chat as well), editing of own or others' messages (essentially a live wiki thread), and ability to integrate with external websites and mobile devices with embedded 'waves' (such that users can comment on a dozen blogs and see all external comments and discussions integrated as conversations in their Wave client, which is synchronized in real-time). The Wave client acts as a central collection of all of the user's communication on the web. Sounds cool. For the integration alone, it's useful. And it's all open source with a nice API.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-5195228639803542476?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/5195228639803542476/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/interesting-collaboration-tool.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/5195228639803542476'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/5195228639803542476'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/interesting-collaboration-tool.html' title='Interesting collaboration tool'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/__iI3woR69XA/SiX94RS0ukI/AAAAAAAAAAc/DPeEPwtN0lg/s72-c/google_wave_logo.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-3237868542377109212</id><published>2009-06-01T15:55:00.000-04:00</published><updated>2009-06-01T16:29:35.857-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediawiki'/><category scheme='http://www.blogger.com/atom/ns#' term='tag extension'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><title type='text'>The beginnings of a tag extension</title><content type='html'>Today I started writing the php files for the mediawiki extension I'll be making. The most important thing right now is to make sure the interfacing of the PHP and Python files works, so for now I focused on writing the PHP &lt;span style="font-style: italic;"&gt;'render' function &lt;/span&gt;(part of the main php setup file) which just sends an HTTP request to an &lt;span style="font-style: italic;"&gt;interface.py&lt;/span&gt; file and that Python file then renders the html and writes it to the http connection. There were some standard things to take care of in the php setup file, like registering the extension features in mediawiki (I used 'parserhook' as the extension class, as I'm currently test-implementing an XML tag; registering it makes sure it shows up on Special:Version), and setting up the parserhook to 'hook' the tag name to the rendering function. I tested using my tag on LocalWiki (my local mediawiki installation) and it works as expected (right now it just displays back the $input and $args). Next I'll start coding the actual python modules, so that the tag will start doing something more interesting (and more useful) than just dumping back the user input.&lt;br /&gt;&lt;br /&gt;I also spent some time today trying to install statsjam locally to see how the user interface works but it seems like there's an import error in one of the python files. (I decided it's not worthwhile to spend more time trying to fix it since I still have the source files to look at).&lt;br /&gt;&lt;br /&gt;Meanwhile, I'm also installing some of the extensions which are already being used at OWW (&lt;a href="http://openwetware.org/wiki/Special:Version"&gt;Special:Version&lt;/a&gt;), with the idea of test using them to figure out first-hand what's missing (keeping in mind the snapshots of the hadley centre workflow tasks and the interviews which I've been reading through).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-3237868542377109212?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/3237868542377109212/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/beginnings-of-tag-extension.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/3237868542377109212'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/3237868542377109212'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/06/beginnings-of-tag-extension.html' title='The beginnings of a tag extension'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-8656479789126403383</id><published>2009-05-29T17:01:00.010-04:00</published><updated>2009-05-29T18:54:29.041-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediawiki'/><category scheme='http://www.blogger.com/atom/ns#' term='OWW'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><title type='text'>Exploring the options</title><content type='html'>Today I've been going back and forth between writing several simple PHP scripts and test running them through my apache server, and on the other hand, exploring the possibility of creating a MediaWiki extension in Python with a minimal amount of PHP coding required.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://www.mediawiki.org/wiki/Manual:Extensions"&gt;MediaWiki manual&lt;/a&gt; which talks about extension writing explains they all consist of three conceptual parts - the setup file, the execution file, and a file with internationalization information about the extension. Of course, that manual suggests all of these parts are to be written in PHP; however: by looking at the statsjam source code (and getting a helpful response from its main developer, thanks Jon!) I think it will be possible to write the main execution code in Python and then interface that with the setup.php file by way of an HTTP request to the Python CGI script. It seems like a fairly straight-forward method so I'll give it a test try with a simple python script later today.&lt;br /&gt;&lt;br /&gt;Meanwhile, I've been discussing ideas about features with OWW's developer. Still looking into ways to collaborate more effectively. We'll probably take the discussion over to an OWW page, where users could contribute ideas as well (one already exists: &lt;a href="http://openwetware.org/wiki/OpenWetWare:Software/Extensions"&gt;here&lt;/a&gt;).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-8656479789126403383?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/8656479789126403383/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/exploring-options.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/8656479789126403383'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/8656479789126403383'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/exploring-options.html' title='Exploring the options'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-5900600868916213358</id><published>2009-05-28T16:30:00.004-04:00</published><updated>2009-05-28T23:09:19.040-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediawiki'/><category scheme='http://www.blogger.com/atom/ns#' term='LAMP'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='climate change'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><category scheme='http://www.blogger.com/atom/ns#' term='apache'/><category scheme='http://www.blogger.com/atom/ns#' term='hadley centre'/><title type='text'>A case study of the Hadley centre + more reading</title><content type='html'>&lt;span id="b24-booktitle-12453"&gt;&lt;span&gt;Today's meeting with Steve cleared up at least part of the questions we had about the daily (software-related) challenges facing the scientists at the Hadley centre.&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt; &lt;/span&gt;He showed us a case study of a scientist accomplishing a particular task, and the difficulties he met along the way, with screenshots of the content he was dealing with. While I won't be directly involved with writing software tools specific to the Hadley centre (see previous posts), I found today's discussion really interesting in that it gave me a more accurate idea of the tasks they need to accomplish, and the tools they are missing. One of the things I found quite striking is their method of keeping track of climate simulations. Given a general climate model, say &lt;span&gt;the&lt;/span&gt;&lt;span style="font-style: italic;"&gt; &lt;/span&gt;&lt;span&gt;HadGEM3, they run countless simulations (either for a particular subsystem - eg atmosphere, or a coupled model - eg atmosphere and oceans) by using different variables and parametrizing various subsystems. The complex relationship between the simulations, and the reasons for running simulation &lt;span style="font-style: italic;"&gt;'adghk'&lt;/span&gt; after simulation &lt;span style="font-style: italic;"&gt;'admnl'&lt;/span&gt; (their computer-generated names are completely arbitrary), as well as the particular changes made from one run to another, are documented in.. well, a hand-written list on a plain wiki page. Just thinking what better things these scientists could be using their time for, as opposed to painstakingly writing out lists of simulation metadata (like simulation names and short descriptions - which could be automatically generated) is frustrating.&lt;br /&gt;&lt;br /&gt;Another issue is the lack of documentation of ancillary files (ie files needed for running the simulations, which include supplemental data). A real problem occurs when one person commits an ancillary file to the repository, say a new orography &lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;span&gt;model, and forgets to include essential information about configurational changes which must be made in order for that code to work correctly. Another person tries to use said ancillary orographic file and his simulation crashes because it was actually necessary to change the value of the gravity wave constant... which the first person who committed the orographic file should have documented in the first place! Such poor documentation can lead to sidetracks of a few days to a few weeks, and in general, comments about a file are documented only after the lack of documentation has already caused a problem.&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span&gt;These are interesting challenges to tackle, especially the latter, since the knowledge of which parameters need to be changed with which file is (it seems) something that is difficult to generate automatically. At the same time, you can't rely on scientists to always add the appropriate comments to all their files because humans suffer from a general dislike of documentation.&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;I spent the rest of the day doing some reading, and I also got around to installing mediawiki.&lt;span style="font-style: italic;"&gt;&lt;br /&gt;&lt;br /&gt;Some books I'm reading today:&lt;/span&gt;&lt;br /&gt;Beginning PHP5&lt;br /&gt;&lt;/span&gt;&lt;span id="b24-booktitle-12453"&gt;Professional LAMP: Linux, Apache, MySQL, and PHP5 Web Development&lt;br /&gt;Apache Server Unleashed&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;For reference, here's the output of the mediawiki installation:&lt;br /&gt;&lt;dl&gt;&lt;dd&gt;Generating configuration file...Database type: MySQL&lt;br /&gt;Loading class: DatabaseMysql&lt;br /&gt;Attempting to connect to database server as root...success.&lt;br /&gt;Connected to 5.0.51a-3ubuntu5.4&lt;br /&gt;Attempting to create database...&lt;br /&gt;Created database &lt;tt&gt;wikidb&lt;/tt&gt;&lt;br /&gt;Creating tables... done.&lt;br /&gt;Initializing data...&lt;br /&gt;Created sysop account &lt;tt&gt;WikiAdmin&lt;/tt&gt;.&lt;br /&gt;Creating LocalSettings.php...&lt;/dd&gt;&lt;/dl&gt;Config file: /etc/mediawiki/LocalSettings.php&lt;br /&gt;Setting permissions: chmod go-rw LocalSettings.php&lt;br /&gt;Path to mediawiki: http://localhost/mediawiki/index.php&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-5900600868916213358?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/5900600868916213358/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/case-study-of-hadley-centre-more.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/5900600868916213358'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/5900600868916213358'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/case-study-of-hadley-centre-more.html' title='A case study of the Hadley centre + more reading'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-5112799481995330621</id><published>2009-05-27T20:57:00.014-04:00</published><updated>2009-05-29T18:56:44.466-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mediawiki'/><category scheme='http://www.blogger.com/atom/ns#' term='API'/><category scheme='http://www.blogger.com/atom/ns#' term='php'/><category scheme='http://www.blogger.com/atom/ns#' term='apache'/><title type='text'>MediaWiki and PHP</title><content type='html'>During the past two days I have been doing lots of reading to get familiar with the mediawiki engine and PHP in general. This is hardly noteworthy of a post, but here's what I've been looking at for the sake of completeness:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.mediawiki.org/wiki/Development_policy"&gt;MediaWiki Development Policy&lt;/a&gt;&lt;br /&gt;&lt;a href="http://svn.wikimedia.org/doc/"&gt;MediaWiki API documentation&lt;/a&gt;&lt;br /&gt;&lt;a href="http://svn.wikimedia.org/viewvc/mediawiki/"&gt;MediaWiki source code&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.php.net/manual/en/"&gt;The official PHP Manual&lt;/a&gt;&lt;br /&gt;A book on PHP5/MySQL Programming&lt;br /&gt;&lt;br /&gt;While I was at it, I also set up an apache server - I figured it would be handy to have it configured (so I can install the mediawiki engine locally; it turned out more time-consuming than I expected -- I just spent a few hours installing various dependencies).&lt;br /&gt;&lt;br /&gt;A note about the LAMP-server installation: if installed from source, apache creates a nice server root directory where everything is located (however you have to manually add the modules for php, python etc.) If installed from apt-get, the modules are easy to install and to get apache to work with php, mysql but the apache directories are in miscellaneous locations.&lt;br /&gt;&lt;br /&gt;For quick reference, the following are the equivalents of the source installation directories:&lt;br /&gt;/etc/apache2/apache2.conf = rootdir/conf/httpd.conf and other config files&lt;br /&gt;/etc/apache2/mods-available and enabled = rootdir/modules&lt;br /&gt;/var/www = roodir/htdocs&lt;br /&gt;/var/log/apache2 = rootdir/log (error and access logs)&lt;br /&gt;/usr/lib/cgi-bin = rootdir/cgi-bin&lt;br /&gt;/etc/init.d/apache2* = rootdir/bin/apachectl*&lt;br /&gt;&lt;br /&gt;Other config files:&lt;br /&gt;php config - /etc/php5/apache2/php.ini&lt;br /&gt;mysql config - /etc/mysql/my.cnf&lt;br /&gt;mediawiki - /etc/mediawiki/apache.conf&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-5112799481995330621?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/5112799481995330621/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/mediawiki-and-learning-php.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/5112799481995330621'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/5112799481995330621'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/mediawiki-and-learning-php.html' title='MediaWiki and PHP'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-3754743471402696873</id><published>2009-05-25T14:56:00.075-04:00</published><updated>2009-05-26T15:33:19.239-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='OWW'/><category scheme='http://www.blogger.com/atom/ns#' term='electronic lab notebook'/><category scheme='http://www.blogger.com/atom/ns#' term='open science'/><category scheme='http://www.blogger.com/atom/ns#' term='design concept'/><title type='text'>A new direction</title><content type='html'>After today's debrief with Steve and the other undergrads, I've decided to continue my project by exploring the existing set of tools offered through OpenWetWare, rather than start from scratch and create a trac-specific tool which would be used by one particular research centre. A contribution to OWW would be worthwhile because of the potential to adapt it to the needs of scientists of any field - and exploring the idea of OWW as a collaborative (and why not inter-disciplinary?) platform. It would be interesting to later interface this platform with various research centres through their respective software management portals - which is more reasonable than creating a separate plugin for each research centre.&lt;br /&gt;&lt;br /&gt;On a related note, here is a summary of the five projects we discussed today (and will be developing this summer):&lt;br /&gt;&lt;ul type="square"&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Social network tool for Trac&lt;/span&gt; - analysis of codebase and issue tracking system, as well as mailing lists and a set of wiki pages for extracting information about users working on related projects;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Wiki markup for data visualization&lt;/span&gt; - GUI tool for embedding tables and dynamic graphs (possibly interfaced with external tools, like &lt;a href="http://manyeyes.alphaworks.ibm.com/manyeyes/"&gt;manyeyes&lt;/a&gt;) into wiki pages;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Green buildings&lt;/span&gt;: what can residential owners do to make their homes more energy-efficient and sustainable - more than a simple calculator of carbon footprint, this tool will use open-access government datasets (like &lt;a href="http://www.data.gov/"&gt;Data.gov&lt;/a&gt;) to determine which changes to an existing building are most efficient (prioritizing) and the savings generated in the long-term&lt;span style="text-decoration: underline;"&gt;&lt;span style="font-style: italic;"&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="text-decoration: underline; font-style: italic;"&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt;Graph of browsing history&lt;/span&gt; (potentially as a Firefox extension) - generating a dynamic visualization of the pages visited by a user, with snapshots of each page for easy recognition - which could be then saved for future reference, or annotated;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Electronic lab notebooks for scientists&lt;/span&gt; - what this blog is about.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Currently reading:&lt;/span&gt;&lt;br /&gt;&lt;a href="http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/"&gt;List of MediaWiki Extensions&lt;/a&gt; (source code)&lt;br /&gt;&lt;a href="http://meta.wikimedia.org/wiki/Category:MediaWiki_extensions"&gt;MediaWiki Extensions on Meta&lt;/a&gt; (descriptions)&lt;br /&gt;&lt;a href="http://commons.wikimedia.org/wiki/Special:Gadgets"&gt;MediaWiki Gadgets&lt;/a&gt;&lt;br /&gt;&lt;a href="http://blog.openwetware.org/developers/"&gt;OpenWetWare Developers - Bill Flanagan's Blog&lt;/a&gt;&lt;a href="http://openwetware.org/wiki/OpenWetWare:Getting_started_3"&gt;&lt;br /&gt;&lt;/a&gt;&lt;a href="http://openwetware.org/wiki/OpenWetWare:Software/Extensions"&gt;OpenWetWare Useful Extensions&lt;/a&gt;&lt;a href="http://www.mediawiki.org/wiki/MediaWiki"&gt;&lt;br /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-3754743471402696873?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/3754743471402696873/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/new-direction.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/3754743471402696873'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/3754743471402696873'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/new-direction.html' title='A new direction'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-8985084272027641222</id><published>2009-05-22T16:18:00.002-04:00</published><updated>2009-05-22T16:19:56.665-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='trac'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='prototype'/><category scheme='http://www.blogger.com/atom/ns#' term='macros'/><title type='text'>Prototyping</title><content type='html'>At this point I started writing a bit of code - but instead of diving straight into the plugin modules, I decided to create a few macros first. Creating macros is a good starting point because it allows me to test extracting data from trac.db, and at the same time use these macros within my local trac environment to determine if these features are useful, without the complication of creating a GUI. Today I wrote two macros - SearchMacro and RelatedMacro - which was a helpful exercise in searching the database and extracting and manipulating wiki meta data.&lt;br /&gt;&lt;br /&gt;A note about the development environment: &lt;a href="http://trac.edgewall.org/wiki/TracDev/DevelopmentEnvironmentSetup"&gt;environment setup&lt;/a&gt; (or at the very least --- tracdeveloperplugin from trac-hacks, provides a very nice plugin registry with all interfaces and components, as well as several handy debug tools)&lt;br /&gt;&lt;br /&gt;Here's an overview of what my macros do: (&lt;span style="font-style: italic;"&gt;Note&lt;/span&gt; these are only initial prototypes to test using these features within the context of a wiki page) &lt;ul type="square"&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;SearchMacro&lt;/span&gt; - parameters: author, topic, title, groupby and order. Searches through the project's wiki pages based on the specified criteria. The parameters are optional; if no parameters are specified it displays a list of all wiki pages within the current project grouped by author and ordered in ascending order.&lt;br /&gt;&lt;span style="font-style: italic;"&gt;--- author&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;topic&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;title&lt;/span&gt; take any string as an argument;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;--- groupby&lt;/span&gt; takes one of the following: ('author', 'date');&lt;br /&gt;&lt;span style="font-style: italic;"&gt;--- order&lt;/span&gt; takes one of the following: ('incr', 'decr').&lt;br /&gt;More parameters could be added easily (e.g. "num_results" to specify the number of results returned and "num_days" to choose results created within the last 5 days for example).&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;RelatedMacro&lt;/span&gt; - takes no parameters and displays a list of "author - title" pairs with links to related wiki pages. The idea is that this macro returns relevant results based on meta data about the user, without the user having to explicitly specify search terms (in contrast to SearchMacro).&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-style: italic;"&gt;Quick Reference:&lt;/span&gt;&lt;br /&gt;&lt;a href="http://trac.edgewall.org/wiki/TracDev/DataModels"&gt;Trac Data Models&lt;/a&gt; - WikiPage and Ticket models&lt;br /&gt;&lt;a href="http://trac.edgewall.org/wiki/TracEnvironment"&gt;The Trac environment&lt;/a&gt; &amp;amp; &lt;a href="http://trac.edgewall.org/wiki/TracIni"&gt;TracIni&lt;/a&gt;&lt;br /&gt;&lt;a href="http://trac.edgewall.org/wiki/TracDev/PluginDevelopment"&gt;Trac Plugin Development&lt;/a&gt; - list of extension points&lt;br /&gt;&lt;a href="http://trac.edgewall.org/wiki/TracDev/DatabaseApi"&gt;Database API&lt;/a&gt; &amp;amp; &lt;a href="http://trac.edgewall.org/browser/trunk/trac/db"&gt;/trunk/trac/db&lt;/a&gt;&lt;br /&gt;&lt;a href="http://genshi.edgewall.org/wiki/ApiDocs/genshi.builder"&gt;Genshi API&lt;/a&gt; - generating markup streams from python code&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Next:&lt;/span&gt;&lt;br /&gt;0) evaluate prototypes (+more design decisions)&lt;br /&gt;1) learn &lt;a href="http://pygtk.org/pygtk2tutorial/"&gt;pyGTK&lt;/a&gt; (library for creating GUIs)&lt;br /&gt;2) start coding the actual modules, with graphical interface&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-8985084272027641222?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/8985084272027641222/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/prototyping.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/8985084272027641222'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/8985084272027641222'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/prototyping.html' title='Prototyping'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-572948086740620377</id><published>2009-05-21T14:04:00.000-04:00</published><updated>2009-05-21T14:04:12.765-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='plugins'/><category scheme='http://www.blogger.com/atom/ns#' term='API'/><category scheme='http://www.blogger.com/atom/ns#' term='trac'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><title type='text'>The Trac Plugin API</title><content type='html'>More preliminary reading.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/__iI3woR69XA/ShVkFSEm29I/AAAAAAAAAAU/JqJ6vmZXfXk/s1600-h/componentdiagram.png"&gt;&lt;img style="margin: 0pt 5px 5px 0pt; cursor: pointer; width: 400px; height: 156px;" src="http://3.bp.blogspot.com/__iI3woR69XA/ShVkFSEm29I/AAAAAAAAAAU/JqJ6vmZXfXk/s400/componentdiagram.png" alt="Component Architecture Diagram" id="BLOGGER_PHOTO_ID_5338282975230876626" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://trac.edgewall.org/wiki/TracDev/ComponentArchitecture"&gt;TracDev - Component Architecture&lt;/a&gt;&lt;br /&gt;The plugin API for Trac; trac.core defines four public classes: Component, ExtensionPoint, Interface and ComponentManager. The different tokens of functionality, i.e. services, are modelled by the Component class, and each provides one or more ExtensionPoints, to which other components can plug in to via an Interface. A plugin is itself a Component, and can both use interfaces to connect to other components by way of their extension points, as well as provide extension points of its own. Here's an example, where WikiPage is the core component, IWikiPageManipulator is its interface, and WikiPagePlugin is another component which extends it:&lt;br /&gt;&lt;pre name="code" class="python"&gt;class IWikiPageManipulator(Interface):&lt;br /&gt;   def some_method(some_value):&lt;br /&gt;   ...&lt;br /&gt;&lt;br /&gt;class WikiPage(Component):&lt;br /&gt;   plugins = ExtensionPoint(IWikiPageManipulator)&lt;br /&gt;   ...&lt;br /&gt;   def some_action(self, some_value):&lt;br /&gt;       for plugin in plugins:&lt;br /&gt;           plugin.some_method(some_value)&lt;br /&gt;       ...&lt;br /&gt;&lt;br /&gt;class WikiPagePlugin(Component):&lt;br /&gt;   implements(IWikiPageManipulator)&lt;br /&gt;   def some_method(self, some_value):&lt;br /&gt;       print some_value&lt;br /&gt;   ...&lt;br /&gt;&lt;/pre&gt;Some useful interfaces/points of entry:&lt;br /&gt;&lt;a href="http://trac.edgewall.org/browser/trunk/trac/core.py#latest"&gt;trac.core&lt;/a&gt; - definitions of the four core classes (read as reference);&lt;br /&gt;&lt;a href="http://trac.edgewall.org/browser/trunk/trac/wiki/api.py"&gt;trac.wiki.api&lt;/a&gt; - provides IWikiChangeListener (for keeping track of changes/history), IWikiMacroProvider and IWikiSyntaxProvider (extending the core macros and syntax - could write macros for embedding functionality such as data visualization);&lt;br /&gt;&lt;a href="http://trac.edgewall.org/browser/trunk/trac/env.py"&gt;trac.env&lt;/a&gt; - components are usually intialized with env (as a component manager); used to store project information (configuration file, sqlite db, templates, plugins, wiki, tickets - in a directory structure);&lt;br /&gt;&lt;a href="http://trac.edgewall.org/browser/trunk/trac/env.py#L273"&gt;trac.env.Environment.get_db_cnx()&lt;/a&gt; - the most common entry point for a database connection; also useful: get_repository(), get_known_users() - this is probably more useful for the social network graph project;&lt;br /&gt;&lt;a href="http://trac.edgewall.org/browser/trunk/trac/search/api.py"&gt;trac.search.api&lt;/a&gt; - ISearchSource interface (search filters) and handy search_to_sql() function;&lt;br /&gt;&lt;a href="http://trac.edgewall.org/browser/trunk/trac/versioncontrol/api.py"&gt;trac.versioncontrol.api&lt;/a&gt; - could be used for exploring the repository; get_history() and get_annotations() for annotated revision history of a particular Node (directory or file); this relates to the idea of tracking relationships between files which were changed in a short period of time;&lt;br /&gt;&lt;a href="http://trac.edgewall.org/browser/trunk/trac/perm.py"&gt;trac.perm&lt;/a&gt; - permissions.&lt;br /&gt;&lt;br /&gt;Even more importantly: &lt;a style="font-weight: bold;" href="http://trac.edgewall.org/wiki/TracDev/DatabaseSchema"&gt;the database schema&lt;/a&gt; - access to all sorts of useful information. The wiki table holds data about all wiki pages and could be searched based on 'name', 'text' or 'comment' for implementing the search tool widget I mentioned in an earlier post; the 'comment' attribute serves the purpose of metadata for wiki entries; could select all wiki pages edited by specific author and order by time to provide a chronological history of wiki edits for a particular user (think about how to extend this to record local changes to a particular wiki page, as opposed to global changes such as adding new pages).&lt;br /&gt;&lt;br /&gt;Next: some design decisions to be made - which specific features would be supported by the plugin, and which extension point interfaces could I make use of? Having looked at the plugin API and related documentation, I'm starting to think that I can classify the features of my potential plugin as:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Those which &lt;span style="font-style: italic;"&gt;extract information&lt;/span&gt; from the project database and provide it to the user. Example: the search feature. Implementation: could consist of a set of well-written queries into trac.db. User interface: custom macro.&lt;/li&gt;&lt;li&gt;Those which make &lt;span style="font-style: italic;"&gt;configurational changes&lt;/span&gt; and extend the wiki functionality. Example: contextual commenting system. Implementation &amp;amp; user interface: still need to investigate these (meaning I'll be spending lots of time reading source code on trac-hacks :-)&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-572948086740620377?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/572948086740620377/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/trac-plugin-api.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/572948086740620377'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/572948086740620377'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/trac-plugin-api.html' title='The Trac Plugin API'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/__iI3woR69XA/ShVkFSEm29I/AAAAAAAAAAU/JqJ6vmZXfXk/s72-c/componentdiagram.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-6936091858804279336</id><published>2009-05-20T10:32:00.010-04:00</published><updated>2009-05-20T13:14:49.322-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='plugins'/><category scheme='http://www.blogger.com/atom/ns#' term='trac'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><title type='text'>Developing software tools for existing configuration management system</title><content type='html'>As a starting point for development of a set of tools constituting an electronic lab notebook, I'll focus on its implementation to the current FCM (flexible configuration management system) in use at the Hadley Centre. The FCM uses &lt;a href="http://subversion.tigris.org/"&gt;svn&lt;/a&gt; for version control and &lt;a href="http://trac.edgewall.org/"&gt;trac&lt;/a&gt; for issue tracking. Since trac is a software management portal which consists of a wiki, an issue tracker and a subversion repository browser, it would make sense to focus on building a plugin for the Trac wiki (which is already being used by scientists for recording information about their climate simulations...but I need Steve's notes on that).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Specifics of implementation:&lt;/span&gt;&lt;br /&gt;&lt;a href="http://trac-hacks.org/"&gt;trac-hacks.org&lt;/a&gt; is a site which provides svn hosting of community-created Trac plugins, macros, scripts, themes, patches. &lt;a href="http://trac.edgewall.org/wiki/TracPlugins"&gt;Trac plugins&lt;/a&gt; are packaged as &lt;a href="http://peak.telecommunity.com/DevCenter/PythonEggs"&gt;python eggs&lt;/a&gt;. Python modules and packages are distributed using &lt;a href="http://docs.python.org/distutils/introduction.html"&gt;distutils&lt;/a&gt; (&lt;a href="http://peak.telecommunity.com/DevCenter/setuptools"&gt;setuptools&lt;/a&gt;, an extension to distutils, is used to build eggs).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Currently:&lt;/span&gt;&lt;br /&gt;I created a local svn repository (using some &lt;a href="http://sourceforge.net/scm/?type=svn&amp;amp;group_id=225104"&gt;statsjam source files&lt;/a&gt;) &amp;amp; a trac project for testing purposes (it's useful to modify the default trac permissions). I also created several python eggs from existing projects to test using setuptools. Everything's working OK so far. Next I'll examine an existing &lt;a href="http://trac-hacks.org/wiki/WikiStatsPlugin"&gt;trac plugin&lt;/a&gt;'s source code as a useful example and starting point for writing my python modules.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-6936091858804279336?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/6936091858804279336/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/developing-software-tools-for-existing.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/6936091858804279336'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/6936091858804279336'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/developing-software-tools-for-existing.html' title='Developing software tools for existing configuration management system'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-4757263548192715037</id><published>2009-05-19T03:49:00.135-04:00</published><updated>2009-05-19T13:14:23.139-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='electronic lab notebook'/><category scheme='http://www.blogger.com/atom/ns#' term='open science'/><title type='text'>Electronic Lab Notebooks</title><content type='html'>It seems that the majority of scientists who record experimental data electronically use one of a small set of tools: a &lt;span style="font-style: italic;"&gt;wiki page&lt;/span&gt; (e.g. the climate change scientists at Hadley centre; &lt;a href="http://openwetware.org/wiki/Main_Page"&gt;OpenWetWare&lt;/a&gt; at MIT), a &lt;span style="font-style: italic;"&gt;blog&lt;/span&gt; (e.g. a &lt;a href="http://usefulchem.wikispaces.com/Exp219"&gt;chemical experiment&lt;/a&gt;, part of the Open Notebook Science initiative [1]), or &lt;span style="font-style: italic;"&gt;spreadsheet software&lt;/span&gt; (e.g. the scientists from a medical imaging group at UHN).&lt;br /&gt;&lt;br /&gt;All of these tools are not sufficiently suited to scientists' needs for two major reasons: &lt;span style="font-style: italic;"&gt;firstly&lt;/span&gt;, all of the information needs to be manually inputted (and since a large part of experimental writeups is similar, a lot of the information needs to be copy-pasted from previous pages -- this process is not optimized); &lt;span style="font-style: italic;"&gt;secondly&lt;/span&gt;, the relationship between related experiments/simulations/papers/authors is not immediately obvious because the linking between related pages is ineffective (if it exists).&lt;br /&gt;&lt;br /&gt;Here's a list of preliminary ideas for some features and functionality (some, but not all, of these already implemented, see OpenWetWare [2]):&lt;br /&gt;&lt;ul type="square"&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Modularity&lt;/span&gt; - after considering the possibility of building wiki page templates, I think that a template for an entire page, being suited to the particular requirements of one experiment, would be too specific to the particular research group for which it was developed. Instead, dividing up the structure of the page into separate independent modules (i.e. thematic sections), which could be independently added, modified and deleted, would allow for quick &amp;amp; easy customization as well as adaptability to different contexts and experiments (flexibility is important).&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Extended functionality&lt;/span&gt; - a suite of widgets embedded into a side bar for example, could provide useful features such as searching through existing wiki experiment pages, which would allow a user to find and preview related experiments (as well as re-use content). Example: User A is writing a wiki page about climate simulation A; User B already wrote a very similar wiki page about climate simulation B; User A could use the search widget to find wiki page B and import one or more modules, so that user A can avoiding the need to re-enter repetitive information and can only focus on editting the different bits.&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Cross-linking&lt;/span&gt; - using the search widget, finding related experiments as well as external resources would be easy;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;History of Recent Changes&lt;/span&gt; - as the user creates a wiki page, adds modules and searches through related papers, it could be useful to record every change made to the wiki page, chronologically;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Meta data&lt;/span&gt; - tags (categorical and descriptive? [4]) to allow for easy browsing &amp;amp; searching through wiki pages;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Permissions&lt;/span&gt; - since experiment wiki pages would ideally be searchable and cross-linked, there could be a feature marking certain modules as 'private' (e.g. for personal annotations);&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Import/Export of Data&lt;/span&gt; - embed data sets into wiki page (e.g. &lt;a href="http://sourceforge.net/projects/statsjam/"&gt;Statsjam&lt;/a&gt; - an extension to MediaWiki which allows users to embed database queries and dynamic visualizations into their wiki pages); ability to "export" page summary in a portable format (e.g. reference-card format with abstract and conclusions);&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Contextual Comments&lt;/span&gt; - allow users browsing a lab notebook to make contextual comments/questions related to a particular section (science as a collaborative effort - [1] - see example of comment system below).&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Related readings:&lt;br /&gt;&lt;dl&gt;&lt;dt&gt;&lt;a href="http://www.scientificamerican.com/article.cfm?id=science-2-point-0"&gt;[1] M. Waldrop, "Science 2.0: Is Open Access Science the Future?", Scientific American&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;Making scientific experiments and results available online to allow collaboration and boost productivity. Need for a medium allowing discussion and peer-review; current initiatives: &lt;a href="http://www.plosone.org/home.action"&gt;PLoS&lt;/a&gt; (Public Library of Science - an online, peer-reviewed open-access publication). The issue of balancing personal credit for research work (traditionally done through journal publications) and the increased efficiency of collaborative science (with a lesser emphasis on individual contributions). Case in point: &lt;a href="http://www.news.harvard.edu/gazette/2008/02.14/99-fasvote.html"&gt;Harvard's open access FAS repository&lt;/a&gt;.&lt;br /&gt;&lt;/dd&gt;&lt;dt&gt;&lt;br /&gt;&lt;/dt&gt;&lt;dd&gt;In relation to electronic lab notebooks: to allow more tangible communication with a focus on a particular article/experiment, allow contextual commentary to be added to electronic lab notes for targeted feedback (much like the contextual comment system here: &lt;a href="http://www.djangobook.com/en/2.0/chapter01/"&gt;the django book&lt;/a&gt; - click on side bar to explore the comment system).&lt;br /&gt;&lt;/dd&gt;&lt;dd&gt;&lt;br /&gt;&lt;/dd&gt;&lt;dd&gt;&lt;a href="http://www.americanscientist.org/issues/pub/wheres-the-real-bottleneck-in-scientific-computing/1"&gt;G. Wilson, "Where's the Real Bottleneck in Scientific Computing?", 2006, American Scientist&lt;br /&gt;&lt;/a&gt;&lt;/dd&gt;&lt;dd&gt;"Computational illiteracy" among computational scientists prevents them from making use of the advances in modern software engineering practice. In addition, distrust towards the field of 'computer science' (as a source of complexity and potential decrease in productivity, at least at the onset of adopting new software tools) further hinders improvement of scientific software practices. (It is essential that the tools provided to scientists are not only effective but also user-friendly; case in point: subversion &amp;amp; trac used as building blocks for the &lt;a href="http://www.cs.toronto.edu/%7Esme/papers/2008/CiSE-FCMpaper.pdf"&gt;software management system&lt;/a&gt; at Hadley Centre were successful &lt;span style="font-style: italic;"&gt;only because of the added simplifications&lt;/span&gt; provided by the FCM).&lt;/dd&gt;&lt;dt&gt;&lt;br /&gt;&lt;/dt&gt;&lt;dt&gt;&lt;a href="http://openwetware.org/wiki/Lab_Notebook"&gt;[2] OpenWetWare launched at MIT, Open Science, Lab Notebook&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;&lt;span style="font-style: italic;"&gt;Features&lt;/span&gt;:&lt;br /&gt;&lt;ul type="square"&gt;&lt;li&gt;calendar (easy access to notes by date selection),&lt;br /&gt;&lt;/li&gt;&lt;li&gt;search (within own project),&lt;br /&gt;&lt;/li&gt;&lt;li&gt;wiki page template (ability to customize own page template, but same for all experiments),&lt;br /&gt;&lt;/li&gt;&lt;li&gt;modularity (separate editable sections)&lt;/li&gt;&lt;li&gt;recent changes (feed of the latest changes made to own notebook)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;What can be improved (based on my preliminary ideas list above)&lt;/span&gt;:&lt;br /&gt;&lt;ul type="square"&gt;&lt;li&gt;extend search functionality to all community projects (as opposed to own notebook)&lt;/li&gt;&lt;li&gt;copy-paste of sections from previous note pages (i.e. importing modules to reduce manual input of repeated information)&lt;/li&gt;&lt;li&gt;import/export of data - embedding of relevant data (&lt;a href="http://sourceforge.net/projects/statsjam/"&gt;statsjam&lt;/a&gt;) and exporting page summary for easy review/reference&lt;/li&gt;&lt;li&gt;extend recent changes feed to all science notebooks, not just own (better yet: show recent changes made to &lt;span style="font-style: italic;"&gt;related&lt;/span&gt; notebooks)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/dd&gt;&lt;br /&gt;&lt;dt&gt;&lt;a href="http://portal.acm.org/citation.cfm?id=1248886"&gt;[3] J. Carver et al., "&lt;strong style="font-weight: normal;"&gt;Software Development Environments for Scientific and Engineering Software&lt;/strong&gt;&lt;strong style="font-weight: normal;"&gt;: A Series of Case Studies&lt;/strong&gt;", 2007, IEEE Computer Society&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;DARPA sponsored case studies of software development for computational science - purpose: to explore the software engineering process used in the field of scientific computation and identify tools which would improve scientific productivity. Findings:&lt;br /&gt;&lt;ul type="square"&gt;&lt;li&gt;Differences between scientific software projects and their commercial IT counterparts: in scientific computation outcomes of computation are largely unknown; ensuring validity of scientific principles takes priority over software engineering principles; code is developed by scientists, not by trained software engineers.&lt;/li&gt;&lt;li&gt;To address these issues: &lt;a href="http://www.highproductivity.org/"&gt;HPCS&lt;/a&gt; (High Productivity Computing Systems) project, which aims to double productivity rates of large-scale computing systems, by assessing the role of software engineering principles in affecting the development time of computational projects.&lt;/li&gt;&lt;li&gt;Methodology: study conducted is very similar, both in approach and preliminary questions of interest, to &lt;a href="http://www.cs.utoronto.ca/%7Esme/papers/2008/Easterbrook-Johns-2008.pdf"&gt;Steve's paper&lt;/a&gt; on the Hadley Centre, which confirms the validity of these findings across multiple research centres and projects.&lt;/li&gt;&lt;li&gt;Observations: of the five projects studied, three use Fortran as the primary language for scientific computation and two use C++; validation and verification done through comparison with past experiments - integration testing (though one project used hand integration of every line of code!); issues/concerns: need for portability (platforms ranging from PCs to parallel supercomputers), documentation (when present) proves useful.&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;/dd&gt;&lt;dt&gt;&lt;a href="http://portal.acm.org/citation.cfm?id=1458603"&gt;[4] M. Strohmaier, "Purpose Tagging: Capturing User Intent to Assist Goal-Oriented Social Search", 2008, Graz University of Technology&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;Strategy of tagging - category vs. description tags. Use of tags to convey user goals - extracting intent from content.&lt;br /&gt;&lt;/dd&gt;&lt;/dl&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-4757263548192715037?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/4757263548192715037/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/electronic-lab-notebooks.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/4757263548192715037'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/4757263548192715037'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/electronic-lab-notebooks.html' title='Electronic Lab Notebooks'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-6485126134413887807</id><published>2009-05-19T00:29:00.044-04:00</published><updated>2009-05-19T03:40:12.752-04:00</updated><title type='text'>Project starting points</title><content type='html'>Out of the topics discussed over the course of the past few days, several seem to have emerged as both significant in their impact on climate modelling research, and viable to attempt development and implementation over the summer.&lt;br /&gt;&lt;br /&gt;The climate research effort falls in the domain of computational science, where scientific models are typically written and maintained by climate scientists who are usually unaware or uninterested in software engineering practices and tools. The extent to which software engineering processes and tools could improve the working environment of climate modelling centres has only recently become a topic of interest [1]. A study of the Hadley Centre [2] reveals that careful selection of a software configuration management system for managing the centre's code base has had a largely positive impact on productivity and reduce development release cycles.&lt;br /&gt;&lt;br /&gt;Annotated reading:&lt;br /&gt;&lt;dl&gt;&lt;a href="http://www.cs.utoronto.ca/%7Esme/papers/2008/Easterbrook-Johns-2008.pdf"&gt;[1] S. Easterbrook and T. Johns, "Engineering the Software for Understanding Climate Change", SE Wiki&lt;/a&gt;&lt;br /&gt;&lt;dd&gt;Advances in software development practices as described from a software engineering standpoint are not met by corresponding practices in most of the large climate change research centres. The specific software needs of science researchers include the use of &lt;span style="font-style: italic;"&gt;older programming languages&lt;/span&gt; (e.g. Fortran) for modelling and simulation due to the long lifespan of projects; the ability to keep track of &lt;span style="font-style: italic;"&gt;code versioning used in individual experiments&lt;/span&gt;; as well as the need to arrive at &lt;span style="font-style: italic;"&gt;reproducible code results&lt;/span&gt;.&lt;br /&gt;&lt;/dd&gt;&lt;dd&gt;&lt;br /&gt;&lt;/dd&gt;&lt;dd&gt;Further, scientific code verification and validation is difficult to define since the "correct" results are not known prior to experimentation. Techniques used include &lt;span style="font-style: italic;"&gt;bit comparison&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;integration testing&lt;/span&gt;.&lt;br /&gt;&lt;/dd&gt;&lt;dd&gt;&lt;br /&gt;&lt;/dd&gt;&lt;dd&gt;Interesting to note is that advances in computing power &lt;span style="font-style: italic;"&gt;do not&lt;/span&gt; lead to faster modelling techniques - instead, the complexity of the models is increased to improve accuracy and reduce the number of parameters, while maintaining an average run time of about several weeks to several months for large coupled GCMs (general circulation models).&lt;br /&gt;&lt;/dd&gt;&lt;dd&gt;&lt;br /&gt;&lt;/dd&gt;&lt;dd&gt;A case study of the Hadley Centre reveals that &lt;span style="font-style: italic;"&gt;open-source tools&lt;/span&gt; such as Subversion and Trac serve as excellent building blocks for scientific software development practices, as the role of scientific researchers overlaps with that of some members of the open-source community. Specifically, scientific computation requires that scientists are both users and developers of their code, which may have interesting effects on code reliability (Would code tend to be less defect-prone since its developers are most familiar with its application? Or, on the other hand, would code tend to be less reliable since the main developers are largely unqualified in the field of software development?)&lt;/dd&gt;&lt;dt&gt;&lt;br /&gt;&lt;/dt&gt;&lt;dt&gt;&lt;a href="http://www.cs.toronto.edu/%7Esme/papers/2008/CiSE-FCMpaper.pdf"&gt;[2] D. Matthews, G. Wilson and S. Easterbrook, "Configuration Management for Large-Scale Scientific Computing at the UK Met Office", online paper&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;Given that the scientific software development process depends on two things: the amount of time it takes scientists to develop the code and the amount of time it takes the supercomputers to run the GCM simulation, and the latter is constantly being improved as per Moore's Law, the inevitable conclusion is that the &lt;a href="http://www.americanscientist.org/issues/pub/wheres-the-real-bottleneck-in-scientific-computing/1"&gt;real problem&lt;/a&gt; causing inefficiency in scientific computing is the lack of an effective code development process.&lt;/dd&gt;&lt;dt&gt;&lt;br /&gt;&lt;/dt&gt;&lt;dd&gt;&lt;span style="font-style: italic;"&gt;Characteristics of a successful configuration management system&lt;/span&gt;:&lt;/dd&gt;&lt;/dl&gt;&lt;ul type="square"&gt;&lt;li&gt;Code maintenability (as a way of improving development practices) is often more important than improvements in computing power&lt;/li&gt;&lt;li&gt;The &lt;span style="font-style: italic;"&gt;combination&lt;/span&gt; of open-source tools with custom-made FCM interface allows much-needed independence of external sources while the FCM system lowers the adoption costs and customizes the system to simplify common tasks&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Cooperation between IT specialists (familiar with the scientists' needs) and management (through initiative and support)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;A service, not a tool (ongoing software support) - plenty of documentation and live support&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Thoughtful migration process - old code + history was imported to svn and tickets were imported into Trac, i.e. minimal disruption to ongoing code development&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-6485126134413887807?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/6485126134413887807/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/project-starting-points.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/6485126134413887807'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/6485126134413887807'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/project-starting-points.html' title='Project starting points'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5882627736725710289.post-1052429025339471198</id><published>2009-05-19T00:02:00.006-04:00</published><updated>2009-05-21T14:54:18.744-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='climate change'/><category scheme='http://www.blogger.com/atom/ns#' term='software engineering'/><title type='text'>Introduction.</title><content type='html'>I'll be using this blog to record thoughts and impressions related to my current research interests, with the hope that ideas of interest may come out of my daily musings - and to prevent such elusive occurences from getting lost in the mass of the inconsequential.&lt;br /&gt;&lt;br /&gt;I have just completed my first year in the Engineering Science program at the University of Toronto, with an interest in computer science. My current summer position is at the Department of Computer Science, working on a new research project aiming to define the role of software engineering tools in enabling climate change research. The project is not limited to a particular application, but if any worthwhile software tools end up being developed this summer they will be tested at the &lt;a href="http://www.metoffice.gov.uk/climatechange/"&gt;Hadley Centre&lt;/a&gt; (MetOffice UK, see also: &lt;a href="http://www.metoffice.gov.uk/research/nwp/numerical/unified_model/"&gt;the unified model&lt;/a&gt; and the &lt;a href="http://www.metoffice.gov.uk/research/nwp/external/fcm/"&gt;FCM&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;My supervisor is &lt;a href="http://www.cs.toronto.edu/%7Esme"&gt;Steve Easterbrook&lt;/a&gt; (see also &lt;a href="http://www.easterbrook.ca/steve"&gt;Serendipity&lt;/a&gt;).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5882627736725710289-1052429025339471198?l=abelian-grape.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://abelian-grape.blogspot.com/feeds/1052429025339471198/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/introduction.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/1052429025339471198'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5882627736725710289/posts/default/1052429025339471198'/><link rel='alternate' type='text/html' href='http://abelian-grape.blogspot.com/2009/05/introduction.html' title='Introduction.'/><author><name>maria yancheva</name><uri>http://www.blogger.com/profile/14022753457227092077</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
