SEC file format

The key to SEC files is using regular expressions, anchored on the Hex, UWP, and PBG. Those are relatively easy to find, after which the remaining data falls into place. Thus your code won't care how wide the world name is, etc.
 
kristof65 said:
Just an idea to throw out there, given my limited knowledge of the .sec format and XML, but wouldn't XML be the "solution" to this?

As I understand, part of the XML file structure defines the data structure within in terms that any XML capable program can translate. So if someone were to convert the .sec files to XML, any one with recent versions of MS Office and Open Office (among others) could use the files, while specialty programs could be written to extract the appropriate data from any XML sector file that used the correct headings, extra data or different formatting excluded.

All that would really need to be defined by the community would be a set of "standard" tags.

XML is sort of heavy weight really. What you need is a basic key/value pair for each system in a sector so a csv format would work very well. Basically, you can store the key names for systems in the first line like

Name, Starport, ....

and then have the rest of the lines hold the values for each system. This would be very simple and could be easily parsed by utilities.

Again, I may not know what I'm talking about, but it seems like that's what XML is really for.
 
dorward said:
CSV is not a nice format, it is even less readable than XML.

I'd probably be tempted to use the Config::General format: http://search.cpan.org/~tlinden/Config-General-2.42/General.pm#CONFIG_FILE_FORMAT

... except I don't know what the state of play for parsers for it is outside Perl land.

Do you mean human readable or machine readable? It is very good as a machine readable format as it is trivial to parse. As far as human readable, it is not too bad. Just load it in excel or open office.

The scheme is pretty simple,
1.) load the first line to get the position of all the keys.
2.) each following line is a separate system with the values defined by line 1 in the file.

You can store the sector file as a list of dictionaries in memory where each line is stored as a dictionary. This gives you access to keys/values for each system.

For example the file:

Name, Starport
Foo, A
Bar, C

would parse into a list [{Name : "Foo", Starport : "A"}, {Name : "Bar", Starport : "C"}] in python. It should be just as simple in Perl.
 
dorward said:
CSV is not a nice format, it is even less readable than XML.

I'd probably be tempted to use the Config::General format: http://search.cpan.org/~tlinden/Config-General-2.42/General.pm#CONFIG_FILE_FORMAT

... except I don't know what the state of play for parsers for it is outside Perl land.

Config format is supported in the Python standard library, and there's a top notch open source third party library that's even better. Config would be my preferred choice as it's extremely flexible and has excellent library support. It's also highly human readable. JSON would be my second choice.

XML is ok, but less human readable and more work to machine-parse but I'd live with it.

CSV is very compact, so would be good for storing whole sectors of data. With a header line it can also be reasonably flexible. I'd be fine with it only if the column meanings are defined by the headers and not hard-coded, otherwise it's almost as rigid as fixed-field.

Simon Hibbs
 
simonh said:
CSV is very compact, so would be good for storing whole sectors of data. With a header line it can also be reasonably flexible. I'd be fine with it only if the column meanings are defined by the headers and not hard-coded, otherwise it's almost as rigid as fixed-field.

I agree. CSV columns must be defined by a header field in order to be usable.
 
simonh said:
XML is ok, but less human readable and more work to machine-parse but I'd live with it.
My understanding of this is that many programs are already built to parse the XML documents and that in XML documents, the data structure is part of the file information.

So while someone looking at the raw XML file might find it harder to read, someone trying to open it with OpenOffice or MS Office 2007 would find it nicely formatted for them, assuming whomever created the XML file in the first place set it up properly.

I suggested it thinking mainly of trying to make the data more accessible to the casual computer user using tools/procedures they already know, rather than making it easier for the programmer or macro wiz who wants to manipulate the data. The people who want to manipulate the data typically have the know-how necessary knowledge to extract what they want anyway.
 
kristof65 said:
simonh said:
XML is ok, but less human readable and more work to machine-parse but I'd live with it.
My understanding of this is that many programs are already built to parse the XML documents and that in XML documents, the data structure is part of the file information.

So while someone looking at the raw XML file might find it harder to read, someone trying to open it with OpenOffice or MS Office 2007 would find it nicely formatted for them, assuming whomever created the XML file in the first place set it up properly.

I suggested it thinking mainly of trying to make the data more accessible to the casual computer user using tools/procedures they already know, rather than making it easier for the programmer or macro wiz who wants to manipulate the data. The people who want to manipulate the data typically have the know-how necessary knowledge to extract what they want anyway.

XML is good for recursive data structures or data structures with a hierarchy. Data entry is going to be complicated because you cannot simply open up a text editor and start entering data like you can with csv or fixed width format. XML document editors are not very nice for fast data entry. Csv has the advantage that a spreadsheet program gives you a very fast method of data entry. It is trivial to read the format, and it is trivial to write in the format.
 
hhawk said:
Do you mean human readable or machine readable?

I mean human readable

hhawk said:
As far as human readable, it is not too bad. Just load it in excel or open office.

You then end up with lots of horizontal scrolling, which isn't ideal.

simonh said:
Config format is supported in the Python standard library, and there's a top notch open source third party library that's even better.

Ah, but is that the Apache style config format? Can you point towards the docs for Python?


kristof65 said:
So while someone looking at the raw XML file might find it harder to read, someone trying to open it with OpenOffice or MS Office 2007 would find it nicely formatted for them, assuming whomever created the XML file in the first place set it up properly.

I don't think that is the case. It would be possible to write stylesheets that could transform documents into the XML formats understood by OpenOffice.org and Microsoft Office (OpenDoc and Open XML I think they are called), but I don't know if they could be set up to run the stylesheet automatically. I'm sure that if it is possible, it would require a plug in to be written. (And converting back would be another story).

I think the best approach is a format that programmers can get up and running with easily so that they can build tools to edit it. From this perspective, XML is possibly the best option, since it has lots of parsers and generators available for it. It isn't as readable or as easy to edit by hand as Config General files, but the ease of having parsers available in pretty much every language under the sun probably offsets that.
 
Try as you might, ignoring the CT/MT 80 character line standard for UWP is not an option. At the very least, any current utility will need to be able to parse this text format, and output it for in-game use.

You know... at a table with friends.
 
dorward said:
You then end up with lots of horizontal scrolling, which isn't ideal.

Sure, it is not ideal, but it can be mitigated easily enough. Data entry is very fast in a spreadsheet with the tab and home keys.

dorward said:
I think the best approach is a format that programmers can get up and running with easily so that they can build tools to edit it. From this perspective, XML is possibly the best option, since it has lots of parsers and generators available for it. It isn't as readable or as easy to edit by hand as Config General files, but the ease of having parsers available in pretty much every language under the sun probably offsets that.

XML is not the best option because you are going to have to reparse the XML parse tree to get the data in the format that you want. I can donate Python code that loads header delimited csv files


Code:
import csv

# Private function used to build a list of dictionaries where each dictionary is a csv record (line)
def build_ldict(fields,csvlist):
    return [ dict(zip(fields,i)) for i in csvlist]

# opens and parses filename and returns a list of dictionaries where each dictionary is a csv record (line)
# first line in the csv provides the name for each column.  This name is the key in the dictionary to access the respective column for each csv record.
def csv_file_to_ldict(filename):
    f = open(filename,"r")

    r = csv.reader(f)

    records = [i for i in r]

    f.close()

    return build_ldict(records[0],records[1:])

# creates the csv file filename from the list of dictionaries ldict
def ldict_to_csv_file(filename, ldict):

    keys = ldict[0].keys()
    
    f = open(filename,"w")

    print >>f, ','.join(keys)

    for i in ldict:
        values = [ str(i[j]) for j in keys ]
        print >>f, ','.join(values)
    
    f.close()

    return
 
GypsyComet said:
Try as you might, ignoring the CT/MT 80 character line standard for UWP is not an option. At the very least, any current utility will need to be able to parse this text format, and output it for in-game use.

You know... at a table with friends.

Parsing the SEC file format is actually pretty simple as long as the files actually comply with the stated format.


Code:
import string

# Load a SEC file and return a list of dictionaries
# Each dictionary represents a specific system

def SEC_file_to_ldict(filename):
    # private functions
    def SEC_Line_to_dict(line):
        l = string.ljust(line,80)

        return { 'Name'        : l[0:14],
                 'HexNbr'      : l[14:18],
                 'UWP'         : l[19:28],
                 'Bases'       : l[30:31],
                 'Trade Codes' : l[32:47],
                 'Zone'        : l[48:49],
                 'PBG'         : l[51:54],
                 'Allegiance'  : l[55:57],
                 'Stellar Data': l[58:74]}
        

    def FindSECStart(lines):
        at = 0
        for l in lines:
            test = SEC_Line_to_dict(l)

            test = test['UWP']
            if test[7] == '-':
                if test[0:7].isalnum() and test[8:9].isalnum():
                    return at

            at += 1

        return at
            

    # Start of function
    
    f = open(filename, "r")

    lines = [i.strip('\n') for i in f]

    f.close()

    startingline = FindSECStart(lines)

    return [ SEC_Line_to_dict(l) for l in lines[startingline:]]
 
dorward is right, you can open an XML file in Word, but it will just open it as a text file, because it doesn't know what all the tags mean. You would be better off opening it in a text editor. XML is popular because it's relatively easy for developers to write parsers for very complex data structures, but it's not easily human readable, is prone to errors when written by humans and it's not always trivial to write a parser for a simple XML format.

I'm coming around to the idea of a CVSV-with-headers format. It's pretty easy to parse, is also easily human readable and directly supported by spreadsheets. I'm sold.

The main problem with the classic UWP format is that it is fixed field, which makes it absolutely rigid when it comes to customization. There are already several variations on it from different Traveller versions and it's almost impossible from the bare UWP to tell which version it is from. I agree any Traveller mapping software needs to support it for legacy reasons, but it is not a good format for file storage.

Simon Hibbs
 
I'm still decidedly unsold on CSV.

You have immense amounts of horizontal scrolling, and can't extend the format with data on new lines (such as histories of planets).

XML lets you mix namespaces, so the format could be defined to make use of XHTML modules for text (which would address that particular usecase).

I don't see CSV providing much protection against errors being introduced either. It just allows for different types of errors.

I don't think that limiting the format to make it easy to manipulate with existing tools is that wonderful an idea. It's nice in theory, but I think it requires too many sacrifices.

Better to produce libraries to build tools with, and a first round of tools IMO.
 
XML is overkill and has its own set of problems. The less said about XHTML the better. CSV will either have to have limited line length or will be as unreadable as XML.

The big thing about SEC is that it is a semi-formalized form of the sector data seen in CT/MT and later books. It's easy to interpret by eye and is supported by a lot of tools. Unless someone builds a better set of tools that use a new format, I don't see it getting supplanted.

That being said, when I built the browser based sector generator, I used JSON. It saves the sector data into a hidden text div and reloads it when the page is reloaded. It worked well enough for the application.
 
GypsyComet said:
Try as you might, ignoring the CT/MT 80 character line standard for UWP is not an option. At the very least, any current utility will need to be able to parse this text format, and output it for in-game use.

You know... at a table with friends.
This is very important. The SEC file format is primarily a human readable format. All of the other formats under discussion are primarily for machine reading, with the ability to be read by a human. Until the computer becomes a integral part of the table top role playing experience, this requirement for a human readable first format for information won't change.
 
tjoneslo said:
This is very important. The SEC file format is primarily a human readable format. All of the other formats under discussion are primarily for machine reading, with the ability to be read by a human. Until the computer becomes a integral part of the table top role playing experience, this requirement for a human readable first format for information won't change.

There's no need for the format to be designed to be used on paper at the tabletop. Nobody reads raw Word documents on paper (or at all for that matter), they use tools to transform it into something friendlier.
 
Deniable said:
XML is overkill and has its own set of problems.

It is a well understood format, with a lot of support for it.

Deniable said:
The less said about XHTML the better.

Most of the problems do to with XHTML relate to bits of it that can be avoided.

Deniable said:
CSV will either have to have limited line length or will be as unreadable as XML.

If CSV gets limited line length, then it makes it inextensible, so its unsuitable for this.

Deniable said:
The big thing about SEC is that it is a semi-formalized form of the sector data seen in CT/MT and later books. It's easy to interpret by eye

Easy? Its a mass of hexidecimal code.

Deniable said:
and is supported by a lot of tools. Unless someone builds a better set of tools that use a new format, I don't see it getting supplanted.

Building better tools is the reason I'm interested in coming up with a new format.
 
dorward said:
There's no need for the format to be designed to be used on paper at the tabletop. Nobody reads raw Word documents on paper (or at all for that matter), they use tools to transform it into something friendlier.
Then why bother with a human readable data format at all? If the primary (only) way to use the data is through a (your) tool, all the objections presented here for human readability are irrelevant. We should use whatever format works best for the tools.
 
For reference, here are links to previous SEC-as-XML file format discussions from CotI. Concerns in these threads should be addressed in any new file format proposal.


UWP Changes (for T5)
http://www.travellerrpg.com/CotI/Discuss/showthread.php?t=5832
Highlights:
* More discussion of file formats towards the end of the thread

Software Systems (XML)
http://www.travellerrpg.com/CotI/Discuss/showthread.php?t=15689
Highlights:
* Verbose (in a good way) XML format suggestion


Milieux Sector Data
http://www.travellerrpg.com/CotI/Discuss/showthread.php?p=194829
Highlights:
* Recommendations for tagging sectors AND/OR star system data with milieux (e.g. 1100, 1202, etc)


Excel importing and exporting XML UWP data
http://www.travellerrpg.com/CotI/Discuss/showthread.php?t=11691
Highlights:
* DTD for MWM's T5 data (which is mastered in a Excel spreadsheet)
* Advocacy for classic UWPs-as-raw-text rather than broken down to excruciating detail in XML format

(IMHO, as long as we're talking about the same underlying data, the particular serialization is not critical - it's only a transform away.)
 
Back
Top