February 26, 2013

Tricky stuff with Sql Server Spatial part 1

Dear reader(s),

Last week I held an internal lecture about SQL Server Spatial and the basics of GIS here at knowit.
I really like talking about GIS as it is about the world and big amounts of data and I do like the world and big  amounts of data.

Well... where I am? who am I?

Oh, yeah. While preparing I noticed a few things that can be a bit tricky with SQL Server Spatial.

This is the first one.

Preparations:

I use a table called test which basically holds one geography column named spatial and another varchar() called name.

Inserts:


INSERT INTO Test (Name, Spatial)
VALUES(
    'Polygon',
    geography::STGeomFromText(
         'POLYGON((-1 -1, 8 -1 , 8 8 ,-1 8, -1 -1))',4326));

So this basically inserts a polygon into the world. As you can see it is square and crosses the equator. The number 4326 is called the SRID and basically means that the coordinates are calculated according to the projection WGS84. (projections basically means that you get different sets of coordinates depending on how you try to put the round globe on to a square piece of paper (a map). Look here for more info.).

Well, besides the 4326 this is no sweat... now try doing it this way:


INSERT INTO Test (Name, Spatial)
VALUES(
    'Polygon',
    geography::STGeomFromText(
         'POLYGON((-1 -1, -1 8 , 8 8 ,8 -1, -1 -1))',4326))

Exactly the same call, but we put the coordinates the opposite way. Now it is no longer a square polygon crossing the equator but instead it is all but a square polygon crossing the equator. It is covering the rest of the world.

Depending on if you add point clockwise or counter clockwise gives you two totally different polygons.

Why?

Well the world IS actually round meaning that any set of points that makes up a polygon on the surface of the world has two meanings. An interior version and an exterior version. Counter clockwise gives the interior version, clockwise the exterior. Easy to forget...

More about it here.


February 9, 2013

Taxonomy pt 3

In my last adventure in taxonomies I had hopes of using Neo4J and Neo4JClient to implement a taxonomy. However, I could never implement the Cypher-queries that worked in Neo4J through the .Net-library Neo4JClient. I got a "400 bad request"-error, struggled, tore my imaginary hair, gave up and changed the route to QuickGraph.

QuickGraph is a graph engine and not a database. It is not a big issue when it comes to a small taxonomy, but would pose problems for huge taxonomies since it holds all data in memory. For me it was a relief, since QuickGraph is something I know and feel comfortable with.

Installation of QuickGraph is straightforward. Just download, build and set a reference.

As a reminder a taxonomy is something where the meaning of a certain term is valid over a specific time. In this case the time is in which revision the term was defined. Like this:



I am using a directed graph where each revision of the taxonomy is descendant of the last revision.

As a practice set I used the gulls in the last blog (check it out).

Step 1.  Create an object that will hold each gull.


    public class Taxon 
    {
        public Guid TaxonId { get; set; }
        public string ScientificName { get; set; }
        public string CommonName { get; set; }
        public int Revision { get; set; }
    }


Since each scientific name can be used in several revisions I use a guid to know which unique taxon it is.

Step 2. Instantiate a directed graph.

BidirectionalGraph<Taxon, IEdge<Taxon>> graph = new BidirectionalGraph<Taxon, IEdge<Taxon>>();

This basically means that the nodes will be of the type Taxon and that the edges will connect two Taxon objects. Bidirectional means that each nod will no who it is descendent from and who derives from it.

Step 3. Implement the InsertTaxon function.


        public void InsertTaxon(Taxon taxon, List<Taxon> ancestors, BidirectionalGraph<Taxon, IEdge<Taxon>> graph)
        {
            graph.AddVertex(taxon);
            foreach (Taxon item in ancestors)
            {
                TaggedEdge<Taxon, double> outEdge = new TaggedEdge<Taxon, double>(taxon, item, 1);
                graph.AddEdge(outEdge);

            }
        }

We're basically inserting the taxon for a new revision of a taxonomy and connects it to all taxons it derives from. By using a TaggedEdge we have a weighted graph. I just default it to one though.

Step 4. Implement the FindTaxon method.


        public IEnumerable<Taxon> FindTaxon(String scientificName, int fromRevision, int toRevision)
        {
            //Find the taxon with the same scientific name from the chosen revision
            Taxon searchTaxon = graph.Vertices.Where(s => s.ScientificName == scientificName && s.Revision == fromRevision).First();
            //use that taxon to find the possible taxons at the to revision
            List<Taxon> results = new List<Taxon>();
            results.AddRange(GetAllDescendantsAtTime(searchTaxon, toRevision));
            return results.Distinct();
        }


        private List<Taxon> GetAllDescendantsAtTime(Taxon searchTaxon, int toRevision)
        {
            List<Taxon> results = new List<Taxon>();
            foreach (Taxon inNode in graph.InEdges(searchTaxon).Where(s => s.Source.Revision <= toRevision && s.Source.Revision > searchTaxon.Revision).Select(n => n.Source))
            {
                if (inNode.Revision != toRevision)
                    results.AddRange(GetAllDescendantsAtTime(inNode, toRevision));
                else
                    results.Add(inNode);
            }
            return results;
        }

So what we basically do is to find what say a Herring Gull was at revision 1 of the taxonomy and then see what it might be at revision 3.
We accomplish this by walking the graph recursively until we find all revision 3 species that are  based on the Herring Gull. The answer would be European Herring Gull, Yellow Legged Gull, Vega Gull, Caspian Gull and American Herring Gull. 

To note: This approach would need a specific graph for each taxonomy and, yeah, there are probably some algorithms that solves the problem faster than mine.

For me: This is perfect! Now I will just make a complete interface to it and use it in all my birding projects!

See you!